Re: [devel] [PATCH 0 of 1] Review Request for cpsv: Support preserving and recovering checkpoint replicas during headless state V2 [#1621]

A V Mahesh Thu, 25 Feb 2016 03:14:05 -0800

Hi Nhat Pham,

Well In any case let's go forward as below:
(a)  For now, let's just document that 'sackptCheckpoint' APis will 
return ERR_NOT_EXIST in the headless state.
In future we can look at ways to create more than two replicas


(b) for the cpnd restart scenario, w.r.t CPSV-CLM integration, handle 
the error code received.


Please publish the v3 patch.

-AVM


On 2/25/2016 3:39 PM, Nhat Pham wrote:
>
> Hi Mahesh and Anders,
>
> Please see my comment below with [NhatPham3]
>
> Best regards,
>
> Nhat Pham
>
> *From:*A V Mahesh [mailto:[email protected]]
> *Sent:* Thursday, February 25, 2016 2:14 PM
> *To:* Nhat Pham <[email protected]>; 'Anders Widell' 
> <[email protected]>
> *Cc:* [email protected]; 'Beatriz Brandao' 
> <[email protected]>; 'Minh Chau H' <[email protected]>
> *Subject:* Re: [PATCH 0 of 1] Review Request for cpsv: Support 
> preserving and recovering checkpoint replicas during headless state V2 
> [#1621]
>
> Hi Nhat Pham,
>
> Please see my comment.
>
> -AVM
>
> On 2/25/2016 12:07 PM, Nhat Pham wrote:
>
>     Hi Mahesh,
>
>     Please see my comment below with [NhatPham2].
>
>     Best regards,
>
>     Nhat Pham
>
>     *From:* A V Mahesh [mailto:[email protected]]
>     *Sent:* Thursday, February 25, 2016 11:26 AM
>     *To:* Nhat Pham <[email protected]>
>     <mailto:[email protected]>; 'Anders Widell'
>     <[email protected]> <mailto:[email protected]>
>     *Cc:* [email protected]
>     <mailto:[email protected]>; 'Beatriz Brandao'
>     <[email protected]>
>     <mailto:[email protected]>; 'Minh Chau H'
>     <[email protected]> <mailto:[email protected]>
>     *Subject:* Re: [PATCH 0 of 1] Review Request for cpsv: Support
>     preserving and recovering checkpoint replicas during headless
>     state V2 [#1621]
>
>     Hi Nhat Pham,
>
>     Please see my comment below.
>
>     -AVM
>
>     On 2/25/2016 7:54 AM, Nhat Pham wrote:
>
>         Hi Mahesh,
>
>         Would you  agree with the comment below?
>
>         To summarize, following are the comment so far:
>
>         *Comment 1*: This functionality should be under checks if
>         Hydra configuration is enabled in IMM attrName =
>
>         const_cast<SaImmAttrNameT>("scAbsenceAllowed").
>
>         Action: The code will be updated accordingly.
>
>         *Comment 2*: To keep the scope of CPSV service as
>         non-collocated checkpoint creation NOT_SUPPORTED , if cluster
>         is running with IMMSV_SC_ABSENCE_ALLOWED ( headless state
>         configuration enabled at the time of cluster startup currently
>         it is not configurable , so there no chance of  run-time
>         configuration change ).
>
>         Action: No change in code. The CPSV still keep supporting
>         non-collocated checkpoint even if IMMSV_SC_ABSENCE_ALLOWED is
>         enable.
>
>      >>[AndersW3] No, I think we ought to support non-colocated
>     checkpoints also when IMMSV_SC_ABSENCE_ALLOWED is set. The fact
>     that we have "system controllers" is an implementation detail of
>     OpenSAF. I don't think the CKPT SAF specification implies that
>      >>non-colocated checkpoints must be fully replicated on all the
>     nodes in the cluster, and thus we must have the possibility that
>     all replicas are lost. It is not clear exactly what to expect from
>     the APIs when this happens, but you could handle it in a similar
>     way as the case >> when all sections have been automatically
>     deleted by the checkpoint service because the sections have expired.
>
>     [AVM]  I am not in agreement with both comments ,   we can not 
>     handle it in a similar to sections expiration case hear , in case
>     of sections expiration checkpoint  replica still exist only
>     section deleted
>
>                 CPSV specification says  if two replicas exist ( in
>     our case Only on SC`s) at a certain point in time, and the nodes
>     hosting both of these replicas is
>                 administratively taken out of service, the Checkpoint
>     Service should allocate another replica on another node while this
>     node is not available
>                 please check section `3.1.7.2 Non-Collocated
>     Checkpoints`  of cpsv specification .
>
>                  For example,  take a case of  application on PL is in
>     progress of writing to non-collocated checkpoint sections (
>     physical replica exist only on  SC`s )
>                  what will happen to application on PL ?   , ok let us
>     consider user agreed to loose the checkpoint and he what to
>     recreated it , what will happen to  cpnd DB on PL and the
>     complexity involved in it (clean up) ,
>                  and this will lead to lot of maintainability issues.
>
>                 On top of that  CKPT SAF specification only says that
>     non-collocated checkpoint and all its sections should survive if
>     the Checkpoint Service running  on cluster and
>                 replica is  USER private data ( not Opensaf States )
>     ,  loosing any USER private data  not acceptable .
>
>     [NhatPham2] According to SAI-AIS-CKPT-B.02.02 (chapter 3.1.8
>     Persistence of Checkpoints):
>
>     “As has been stated in Section 2.1 on page 13, the Checkpoint
>     Service typically stores
>
>     checkpoint data in the main memory of the nodes. *Regardless of
>     the retention time, a *
>
>     *checkpoint and all its sections do not survive if the Checkpoint
>     Service stops running *
>
>     *on all nodes hosting replicas for this checkpoint. The stop of
>     the Checkpoint Service *
>
>     *can be caused by administrative actions or node failures*.”
>
>     This states that the checkpoint doesn’t not survive in case the
>     nodes hosting its replicas failures (i.e SCs in our case).
>
> [AVM If we read further section `3.1.7.2 Non-Collocated Checkpoints` , 
> it explains with example :
>
> "For example, if two replicas exist at a certain point in time, and 
> the node hosting one of these replicas is
> administratively taken out of service, the Checkpoint Service may 
> allocate another
> replica on another node while this node is not available."
>
> [NhatPham3] I think this example is to support the idea of enhancing 
> the availability of checkpoints by creating multiple replicas. 
> Furthermore, it mentions about administrative as, while headless state 
> is about multiple node failure.
>
> @Anders: How do you think?*//*
>
>     Regarding the case you mentioned about the lost checkpoint, what
>     will happen to cpnd DB on PL.
>
>     With this patch the CPND detects un-recoverable checkpoints and
>     deletes them all from the DB in case the headless state happens.
>
> [AVM]  I know  , I was saying  maintaining such flow involved  with  
> transport  `no active timer`   will enable lot of  new issue in CPSV 
> and this becomes code maintainability issue,
>              for example :
>
>                 1)  both SC`s rejoined quickly ( below  `no active 
> timer`  timeout i think it is currently  ) we will end up with  not 
> deleting DB
>                      to address this we need collect evidences to 
> detect  headless state happens.
>
> [NhatPham3] I’m not sure if it’s really a case. But if so, this 
> problem impacts whole system not just CPSV regardless of headless state.
>
> @Ander: How do you think?
>
>         *Comment 3*: This is about case where checkpoint node director
>         (cpnd) crashes during headless state. In this case the cpnd
>         can’t finish starting because it can’t initialize CLM service.
>
>         Then after time out, the AMF triggers a restart again.
>         Finally, the node is rebooted.
>
>         It is expected that this problem should not lead to a node reboot.
>
>         Action: No change in code. This is the limitation of the
>         system during headless state.
>
>
>     [AVM]  code changes required in CPSV CLM integration  code need to
>     be revisited to handle TRYAGAIN.
>
>     [NhatPham2] Agree. The CPND code will updated to re-initialize clm
>     for TRY AGAIN fault code.
>
>         If you agree with the summary above, I’ll update code and send
>         out the V3 for review.
>
>         Best regards,
>
>         Nhat Pham
>
>         *From:* Anders Widell [mailto:[email protected]]
>         *Sent:* Wednesday, February 24, 2016 9:26 PM
>         *To:* Nhat Pham <[email protected]>
>         <mailto:[email protected]>; 'A V Mahesh'
>         <[email protected]> <mailto:[email protected]>
>         *Cc:* [email protected]
>         <mailto:[email protected]>; 'Beatriz
>         Brandao' <[email protected]>
>         <mailto:[email protected]>; 'Minh Chau H'
>         <[email protected]> <mailto:[email protected]>
>         *Subject:* Re: [PATCH 0 of 1] Review Request for cpsv: Support
>         preserving and recovering checkpoint replicas during headless
>         state V2 [#1621]
>
>         See my comments inline, marked [AndersW3].
>
>         regards,
>         Anders Widell
>
>         On 02/24/2016 07:32 AM, Nhat Pham wrote:
>
>             Hi Mahesh and Anders,
>
>             Please see my comments below.
>
>             Best regards,
>
>             Nhat Pham
>
>             *From:* A V Mahesh [mailto:[email protected]]
>             *Sent:* Wednesday, February 24, 2016 11:06 AM
>             *To:* Nhat Pham <[email protected]>
>             <mailto:[email protected]>; 'Anders Widell'
>             <[email protected]>
>             <mailto:[email protected]>
>             *Cc:* [email protected]
>             <mailto:[email protected]>; 'Beatriz
>             Brandao' <[email protected]>
>             <mailto:[email protected]>; 'Minh Chau H'
>             <[email protected]> <mailto:[email protected]>
>             *Subject:* Re: [PATCH 0 of 1] Review Request for cpsv:
>             Support preserving and recovering checkpoint replicas
>             during headless state V2 [#1621]
>
>             Hi Nhat Pham,
>
>             If component ( CPND ) restart allows while Controllers
>             absent , before  requesting CLM going to change return
>             value to**SA_AIS_ERR_TRY_AGAIN ,
>             We need to get clarification from  AMF guys  on few
>             things  why because  if CPND is on SA_AIS_ERR_TRY_AGAIN
>             and component restart timeout
>             then AMF will restart component again ( this become cyclic
>             ) and after   saAmfSGCompRestartMax configured value Node
>             gose for reboot as next level escalation,
>             in that case we may required changes in  AMF as well,  to
>             not to act on component restart timeout in case of
>             Controllers absent ( i am not sure it is deviation of AMF
>             specification ) .
>
>             */[Nhat Pham] In headless state, I’m not sure about this
>             either. /*
>
>             */@Anders: Would you have comments about this?/*
>
>         [AndersW3] Ok, first of all I would like to point out that
>         normally, the OpenSAF checkpoint node director should not
>         crash. So we are talking about a situation where multiple
>         faults have occurred: first both the active and the standby
>         system controllers have died, and then shortly afterwards -
>         before we have a new active system controller - the checkpoint
>         node director also crashes. Sure, these may not be totally
>         independent events, but still there are a lot of faults that
>         have happened within a short period of time. We should test
>         the node director and make sure it doesn't crash in this type
>         of scenario.
>
>         Now, let's consider the case where we have a fault in the node
>         director that causes it to crash during the headless state.
>         The general philosophy of the headless feature is that when
>         things work fine - i.e. in the absence of fault - we should be
>         able to continue running while the system controllers are
>         absent. However, if a fault happens during the headless state,
>         we may not be able to recover from the fault until there is an
>         active system controller. AMF does provide support for
>         restarting components, but as you have pointed out, the node
>         director will be stuck in a TRY_AGAIN loop immediately after
>         it has been restarted. So this means that if the node director
>         crashes during the headless state, we have lost the checkpoint
>         functionality on that node and we will not get it back until
>         there is an active system controller. Other services like IMM
>         will still work for a while, but AMF will as you say
>         eventually escalate the checkpoint node director failure to a
>         node restart and then the whole node is gone. The node will
>         not come back until we have an active system controller. So to
>         summarize: there is very limited support for recovering from
>         faults that happen during the headless state. The full
>         recovery will not happen until we have an active system
>         controller.
>
>             Please do incorporate current comments ( in design
>             prospective )  and republish the patch , I will re-test V3
>             patch and provide review comments on function issue/bugs
>             if I found any.
>
>             One Important note  , in the new patch  let us not have
>             any complexity of  allowing   non-collocated checkpoint
>             creation and then documenting that  in some scenario ,
>             non-collocated checkpoint  replicas are recoverable  , why
>             because replica is  USER private data ( not Opensaf States
>             ) ,  loosing USER private data  not acceptable .
>             so let us keep the scope of CPSV service as non-collocated
>             checkpoint creation NOT_SUPPORTED , if cluster is running
>             with
>              IMMSV_SC_ABSENCE_ALLOWED ( headless state configuration
>             enabled at the time of cluster startup  currently it is
>             not configurable , so their no chance of  run-time
>             configuration change ).
>
>             We can provide support for non-collocated in subsequent
>             enhancements by having  solution like replica on lower
>             node ID PL will also created
>             non-collocated  ( max three riplicas in cluster regradless
>             of where non-collocated is opened ).
>
>             So for now, regardless of the heads (SC`s) status exist
>             not exist  CPSV should return SA_AIS_ERR_NOT_SUPPORTED in
>             case of IMMSV_SC_ABSENCE_ALLOWED enabled cluster ,
>             and let us document it as well.
>
>             */[Nhat Pham] The patch is to limit loosing replicas and
>             checkpoints in case of headless state./*
>
>             */In case both replicas locate on SCs and they reboot,
>             loosing checkpoint is unpreventable with current design
>             after headless state./*
>
>             */Even if we implement the proposal “/*max three riplicas
>             in cluster regradless of where non-collocated is
>             opened*/”, there is still the case where the checkpoint is
>             lost. Ex. The SCs and the PL which hosts the replica
>             reboot same time./*
>
>             */In case /*IMMSV_SC_ABSENCE_ALLOWED disable, if both SCs
>             reboot, this leads whole cluster reboots. Then the
>             checkpoint is lost.
>
>             */What I mean is there are cases where the checkpoint is
>             lost. The point is what we can do to limit loosing data./*
>
>             */For the proposal of reject creating non-collocated
>             checkpoint in case of/* IMMSV_SC_ABSENCE_ALLOWED enabled,
>             I think this will lead to in compatible problem.
>
>             */@Anders: How do you think about rejecting creating
>             non-collocated checkpoint in case of
>             /*IMMSV_SC_ABSENCE_ALLOWED enabled?
>
>         [AndersW3] No, I think we ought to support non-colocated
>         checkpoints also when IMMSV_SC_ABSENCE_ALLOWED is set. The
>         fact that we have "system controllers" is an implementation
>         detail of OpenSAF. I don't think the CKPT SAF specification
>         implies that non-colocated checkpoints must be fully
>         replicated on all the nodes in the cluster, and thus we must
>         have the possibility that all replicas are lost. It is not
>         clear exactly what to expect from the APIs when this happens,
>         but you could handle it in a similar way as the case when all
>         sections have been automatically deleted by the checkpoint
>         service because the sections have expired.
>
>
>             -AVM
>
>             On 2/24/2016 6:51 AM, Nhat Pham wrote:
>
>                 Hi Mahesh,
>
>                 Do you have any further comments?
>
>                 Best regards,
>
>                 Nhat Pham
>
>                 *From:* A V Mahesh [mailto:[email protected]]
>                 *Sent:* Monday, February 22, 2016 10:37 AM
>                 *To:* Nhat Pham <[email protected]>
>                 <mailto:[email protected]>; 'Anders Widell'
>                 <[email protected]>
>                 <mailto:[email protected]>
>                 *Cc:* [email protected]
>                 <mailto:[email protected]>; 'Beatriz
>                 Brandao' <[email protected]>
>                 <mailto:[email protected]>; 'Minh Chau H'
>                 <[email protected]>
>                 <mailto:[email protected]>
>                 *Subject:* Re: [PATCH 0 of 1] Review Request for cpsv:
>                 Support preserving and recovering checkpoint replicas
>                 during headless state V2 [#1621]
>
>                 Hi,
>
>                 >>BTW, have you finished the review and test?
>
>                 I will finish by today.
>
>                 -AVM
>
>                 On 2/22/2016 7:48 AM, Nhat Pham wrote:
>
>                     Hi Mahesh and Anders,
>
>                     Please see my comment below.
>
>                     BTW, have you finished the review and test?
>
>                     Best regards,
>
>                     Nhat Pham
>
>                     *From:* A V Mahesh [mailto:[email protected]]
>                     *Sent:* Friday, February 19, 2016 2:28 PM
>                     *To:* Nhat Pham <[email protected]>
>                     <mailto:[email protected]>; 'Anders Widell'
>                     <[email protected]>
>                     <mailto:[email protected]>; 'Minh Chau H'
>                     <[email protected]>
>                     <mailto:[email protected]>
>                     *Cc:* [email protected]
>                     <mailto:[email protected]>;
>                     'Beatriz Brandao' <[email protected]>
>                     <mailto:[email protected]>
>                     *Subject:* Re: [PATCH 0 of 1] Review Request for
>                     cpsv: Support preserving and recovering checkpoint
>                     replicas during headless state V2 [#1621]
>
>                     Hi Nhat Pham,
>
>                     On 2/19/2016 12:28 PM, Nhat Pham wrote:
>
>                         Could you please give more detailed
>                         information about steps to reproduce the
>                         problem below? Thanks.
>
>
>                     Don't see this as specific bug  , we need to see
>                     the issue as  CLM integrated service point  of view ,
>                     by considering Anders Widell  explication about
>                     CLM  application behavior during headless state
>                     we need to reintegrate CPND with CLM ( before
>                     this  headless state feature  no case of CPND
>                     existence in the obscene of CLMD  , but now it is ).
>
>                     And this will be the consistent across the all
>                     services who integrated with CLM  ( you may need
>                     some changes in CLM also )
>
>                     */[Nhat Pham] I think CLM should return
>                     /*SA_AIS_ERR_TRY_AGAIN in this case.
>
>                     @Anders. How would you think?
>
>                     To start with let us consider case CPND on payload
>                     restarted on PL  during headless state
>                     and an application is in running on PL.
>
>                     */[Nhat Pham] Regarding the CPND as CLM
>                     application, I’m not sure what it can do in this
>                     case. In case it restarts, it is monitored by AMF./*
>
>                     */If it blocks for too long, AMF will also trigger
>                     a node reboot./*
>
>                     */In my test case, the CPND get blocked by CLM. It
>                     doesn’t get out of the saClmInitialize. How do you
>                     get the “/ER cpnd clm init failed with return
>                     value:31/”?/*
>
>                     */Following is the cpnd trace./*
>
>                     Feb 22  8:56:41.188122 osafckptnd
>                     [736:cpnd_init.c:0183] >> cpnd_lib_init
>
>                     Feb 22  8:56:41.188332 osafckptnd
>                     [736:cpnd_init.c:0412] >> cpnd_cb_db_init
>
>                     Feb 22  8:56:41.188600 osafckptnd
>                     [736:cpnd_init.c:0437] << cpnd_cb_db_init
>
>                     Feb 22  8:56:41.188778 osafckptnd
>                     [736:clma_api.c:0503] >> saClmInitialize
>
>                     Feb 22  8:56:41.188945 osafckptnd
>                     [736:clma_api.c:0593] >> clmainitialize
>
>                     Feb 22  8:56:41.190052 osafckptnd
>                     [736:clma_util.c:0100] >> clma_startup:
>                     clma_use_count: 0
>
>                     Feb 22  8:56:41.190273 osafckptnd
>                     [736:clma_mds.c:1124] >> clma_mds_init
>
>                     Feb 22  8:56:41.190825 osafckptnd
>                     [736:clma_mds.c:1170] << clma_mds_init
>
>                     -AVM
>
>                     On 2/19/2016 12:28 PM, Nhat Pham wrote:
>
>                         Hi Mahesh,
>
>                         Could you please give more detailed
>                         information about steps to reproduce the
>                         problem below? Thanks.
>
>                         Best regards,
>
>                         Nhat Pham
>
>                         *From:* A V Mahesh
>                         [mailto:[email protected]]
>                         *Sent:* Friday, February 19, 2016 1:06 PM
>                         *To:* Anders Widell
>                         <[email protected]>
>                         <mailto:[email protected]>; Nhat Pham
>                         <[email protected]>
>                         <mailto:[email protected]>; 'Minh Chau
>                         H' <[email protected]>
>                         <mailto:[email protected]>
>                         *Cc:* [email protected]
>                         <mailto:[email protected]>;
>                         'Beatriz Brandao'
>                         <[email protected]>
>                         <mailto:[email protected]>
>                         *Subject:* Re: [PATCH 0 of 1] Review Request
>                         for cpsv: Support preserving and recovering
>                         checkpoint replicas during headless state V2
>                         [#1621]
>
>                         Hi Anders Widell,
>                         Thanks for the detailed explanation about CLM
>                         during headless state.
>
>                         HI  Nhat Pham ,
>
>                         Comment : 3
>                         Please see below  the problem I was
>                         interpreted now I  seeing it  during CLMD
>                         obscene ( during headless state ),
>                         so now CPND/CLMA need to  to address below
>                         case , currently cpnd clm init failed with
>                         return value: SA_AIS_ERR_UNAVAILABLE
>                         but should be SA_AIS_ERR_TRY_AGAIN
>
>                         ==================================================
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO NODE
>                         STATE-> IMM_NODE_FULLY_AVAILABLE 17418
>                         Feb 19 11:18:28 PL-4 osafimmloadd: NO Sync
>                         ending normally
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO Epoch
>                         set to 9 in ImmModel
>                         Feb 19 11:18:28 PL-4 cpsv_app: IN Received
>                         PROC_STALE_CLIENTS
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO
>                         Implementer connected: 42
>                         (MsgQueueService132111) <108, 2040f>
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO
>                         Implementer connected: 43
>                         (MsgQueueService131855) <0, 2030f>
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO
>                         Implementer connected: 44 (safLogService) <0,
>                         2010f>
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO
>                         SERVER STATE: IMM_SERVER_SYNC_SERVER -->
>                         IMM_SERVER_READY
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO
>                         Implementer connected: 45 (safClmService) <0,
>                         2010f>
>                         *Feb 19 11:18:28 PL-4 osafckptnd[7718]: ER
>                         cpnd clm init failed with return value:31
>                         Feb 19 11:18:28 PL-4 osafckptnd[7718]: ER cpnd
>                         init failed
>                         Feb 19 11:18:28 PL-4 osafckptnd[7718]: ER
>                         cpnd_lib_req FAILED
>                         Feb 19 11:18:28 PL-4 osafckptnd[7718]:
>                         __init_cpnd() failed*
>                         Feb 19 11:18:28 PL-4 osafclmna[5432]: NO
>                         safNode=PL-4,safCluster=myClmCluster Joined
>                         cluster, nodeid=2040f
>                         Feb 19 11:18:28 PL-4 osafamfnd[5441]: NO AVD
>                         NEW_ACTIVE, adest:1
>                         Feb 19 11:18:28 PL-4 osafamfnd[5441]: NO
>                         Sending node up due to NCSMDS_NEW_ACTIVE
>                         Feb 19 11:18:28 PL-4 osafamfnd[5441]: NO 1
>                         SISU states sent
>                         Feb 19 11:18:28 PL-4 osafamfnd[5441]: NO 1 SU
>                         states sent
>                         Feb 19 11:18:28 PL-4 osafamfnd[5441]: NO 7
>                         CSICOMP states synced
>                         Feb 19 11:18:28 PL-4 osafamfnd[5441]: NO 7 SU
>                         states sent
>                         Feb 19 11:18:28 PL-4 osafimmnd[5422]: NO
>                         Implementer connected: 46 (safAmfService) <0,
>                         2010f>
>                         Feb 19 11:18:30 PL-4 osafamfnd[5441]: NO
>                         'safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
>                         Component or SU restart probation timer expired
>                         Feb 19 11:18:35 PL-4 osafamfnd[5441]: NO
>                         Instantiation of
>                         'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
>                         failed
>                         Feb 19 11:18:35 PL-4 osafamfnd[5441]: NO
>                         Reason: component registration timer expired
>                         Feb 19 11:18:35 PL-4 osafamfnd[5441]: WA
>                         'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
>                         Presence State RESTARTING => INSTANTIATION_FAILED
>                         Feb 19 11:18:35 PL-4 osafamfnd[5441]: NO
>                         Component Failover trigerred for
>                         'safSu=PL-4,safSg=NoRed,safApp=OpenSAF':
>                         Failed component:
>                         'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
>                         Feb 19 11:18:35 PL-4 osafamfnd[5441]: ER
>                         
> 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'got
>                         Inst failed
>                         Feb 19 11:18:35 PL-4 osafamfnd[5441]:
>                         Rebooting OpenSAF NodeId = 132111 EE Name = ,
>                         Reason: NCS component Instantiation failed,
>                         OwnNodeId = 132111, SupervisionTime = 60
>                         Feb 19 11:18:36 PL-4 opensaf_reboot: Rebooting
>                         local node; timeout=60
>                         Feb 19 11:18:39 PL-4 kernel: [ 4877.338518]
>                         md: stopping all md devices.
>                         ==================================================
>
>                         -AVM
>
>                         On 2/15/2016 5:11 PM, Anders Widell wrote:
>
>                             Hi!
>
>                             Please find my answer inline, marked
>                             [AndersW].
>
>                             regards,
>                             Anders Widell
>
>                             On 02/15/2016 10:38 AM, Nhat Pham wrote:
>
>                                 Hi Mahesh,
>
>                                 It's good. Thank you. :)
>
>                                 [AVM]  Up on rejoining of the SC`s The
>                                 replica should be re-created regardless
>                                 of another application opens it on PL4.
>                                                ( Note : this comment
>                                 is based on your explanation have not yet
>                                 reviewed/tested  ,
>                                                   currently i am
>                                 struggling with  SC`s    not rejoining
>                                 after headless state , i can provide
>                                 you more on this once i complte my
>                                 review/testing)
>
>                                 [Nhat] To make cloud resilience works,
>                                 you need the patches from other
>                                 services (log, amf, clm, ntf).
>                                 @Minh: I heard that you created tar
>                                 file which includes all patches. Could
>                                 you
>                                 please send it to Mahesh? Thanks
>
>                                 [AVM] I understand that , before I
>                                 comment more on this   please allow me to
>                                 understand
>                                               I am not still not very
>                                 clear of the headless design in detail.
>                                               For example cluster
>                                 membership of PL`s   during headless
>                                 state ,
>                                                In the absence of SC`s 
>                                 (CLMD) dose the PLs is considered as
>                                 cluster nodes or not (cluster
>                                 membership) ?
>
>                                 [Nhat] I don't know much about this.
>                                 @ Anders: Could you please have
>                                 comment about this? Thanks
>
>                             [AndersW] First of all, keep in mind that
>                             the "headless" state should ideally not
>                             last a very long time. Once we have the
>                             spare SC feature in place (ticket [#79]),
>                             a new SC should become active within a
>                             matter of a few seconds after we have lost
>                             both the active and the standby SC.
>
>                             I think you should view the state of the
>                             cluster in the headless state in the same
>                             way as you view the state of the cluster
>                             during a failover between the active and
>                             the standby SC. Imagine that the active SC
>                             dies. It takes the standby SC 1.5 seconds
>                             to detect the failure of the active SC
>                             (this is due to the TIPC timeout). If you
>                             have configured the PROMOTE_ACTIVE_TIMER,
>                             there is an additional delay before the
>                             standby takes over as active. What is the
>                             state of the cluster during the time after
>                             the active SC failed and before the
>                             standby takes over?
>
>                             The state of the cluster while it is
>                             headless is very similar. The difference
>                             is that this state may last a little bit
>                             longer (though not more than a few
>                             seconds, until one of the spare SCs
>                             becomes active). Another difference is
>                             that we may have lost some state. With a
>                             "perfect" implementation of the headless
>                             feature we should not lose any state at
>                             all, but with the current set of patches
>                             we do lose state.
>
>                             So specifically if we talk about cluster
>                             membership and ask the question: is a
>                             particular PL a member of the cluster or
>                             not during the headless state? Well, if
>                             you ask CLM about this during the headless
>                             state, then you will not know - because
>                             CLM doesn't provide any service during the
>                             headless state. If you keep retrying you
>                             query to CLM, you will eventually get an
>                             answer - but you will not get this answer
>                             until there is an active SC again and we
>                             have exited the headless state. When
>                             viewed in this way, the answer to the
>                             question about a node's membership is
>                             undefined during the headless state, since
>                             CLM will not provide you with any answer
>                             until there is an active SC.
>
>                             However, if you asked CLM about the node's
>                             cluster membership status before the
>                             cluster went headless, you probably saved
>                             a cached copy of the cluster membership
>                             state. Maybe you also installed a CLM
>                             track callback and intend to update this
>                             cached copy every time the cluster
>                             membership status changes. The question
>                             then is: can you continue using this
>                             cached copy of the cluster membership
>                             state during the headless state? The
>                             answer is YES: since CLM doesn't provide
>                             any service during the headless state, it
>                             also means that the cluster membership
>                             view cannot change during this time. Nodes
>                             can of course reboot or die, but CLM will
>                             not notice and hence the cluster view will
>                             not be updated. You can argue that this is
>                             bad because the cluster view doesn't
>                             reflect reality, but notice that this will
>                             always be the case. We can never propagate
>                             information instantaneously, and detection
>                             of node failures will take 1.5 seconds due
>                             to the TIPC timeout. You can never be sure
>                             that a node is alive at this very moment
>                             just because CLM tells you that it is a
>                             member of the cluster. If we are
>                             unfortunate enough to lose both system
>                             controller nodes simultaneously, updates
>                             to the cluster membership view will be
>                             delayed a few seconds longer than usual.
>
>
>                                 Best regards,
>                                 Nhat Pham
>
>                                 -----Original Message-----
>                                 From: A V Mahesh
>                                 [mailto:[email protected]]
>                                 Sent: Monday, February 15, 2016 11:19 AM
>                                 To: Nhat Pham
>                                 <[email protected]>
>                                 <mailto:[email protected]>;
>                                 [email protected]
>                                 <mailto:[email protected]>
>                                 Cc:
>                                 [email protected]
>                                 <mailto:[email protected]>;
>                                 'Beatriz Brandao'
>                                 <[email protected]>
>                                 <mailto:[email protected]>
>                                 Subject: Re: [PATCH 0 of 1] Review
>                                 Request for cpsv: Support preserving and
>                                 recovering checkpoint replicas during
>                                 headless state V2 [#1621]
>
>                                 Hi Nhat Pham,
>
>                                 How is your holiday went
>
>                                 Please find my comments below
>
>                                 On 2/15/2016 8:43 AM, Nhat Pham wrote:
>
>                                     Hi Mahesh,
>
>                                     For the comment 1, the patch will
>                                     be updated accordingly.
>
>                                 [AVM]  Please hold , I will provide
>                                 more comments in this week , so we can
>                                 have consolidated V3
>
>                                     For the comment 2, I think the
>                                     CKPT service will not be backward
>                                     compatible if the scAbsenceAllowed
>                                     is true.
>                                     The client can't create
>                                     non-collocated checkpoint on SCs.
>
>                                     Furthermore, this solution only
>                                     protects the CKPT service from the
>                                     case "The non-collocated
>                                     checkpoint  is created on a SC"
>                                     there are still the cases where
>                                     the replicas are completely lost. Ex:
>
>                                     - The non-collocated checkpoint
>                                     created on a PL. The PL reboots. Both
>                                     replicas now locate on SCs. Then,
>                                     headless state happens. All
>                                     replicas are
>                                     lost.
>                                     - The non-collocated checkpoint
>                                     has active replica locating on a PL
>                                     and this PL restarts during
>                                     headless state
>                                     - The non-collocated checkpoint is
>                                     created on PL3. This checkpoint is
>                                     also opened on PL4. Then SCs and
>                                     PL3 reboot.
>
>                                 [AVM]  Up on rejoining of the SC`s The
>                                 replica should be re-created regardless
>                                 of another application opens it on PL4.
>                                                ( Note : this comment
>                                 is based on your explanation have not yet
>                                 reviewed/tested  ,
>                                                   currently i am
>                                 struggling with  SC`s    not rejoining
>                                 after headless state , i can provide
>                                 you more on this once i complte my
>                                 review/testing)
>
>                                     In this case, all replicas are
>                                     lost and the client has to create
>                                     it again.
>
>                                     In case multiple nodes (which
>                                     including SCs) reboot, losing
>                                     replicas
>                                     is unpreventable. The patch is to
>                                     recover the checkpoints in
>                                     possible cases.
>                                     How do you think?
>
>                                 [AVM] I understand that , before I
>                                 comment more on this   please allow
>                                 me to understand
>                                               I am not still not very
>                                 clear of the headless design in detail.
>
>                                               For example cluster
>                                 membership of PL`s   during headless
>                                 state ,
>                                                In the absence of SC`s 
>                                 (CLMD) dose the PLs is considered as
>                                 cluster nodes or not (cluster
>                                 membership) ?
>
>                                                      - if not consider
>                                 as  NON cluster nodes Checkpoint Service
>                                 API  should  leverage the SA Forum
>                                 Cluster
>                                                        Membership
>                                 Service  and API's can fail with
>                                 SA_AIS_ERR_UNAVAILABLE
>
>                                                      - if considers as
>                                 cluster nodes  we need to follow all the
>                                 defined rules which are defined in
>                                 SAI-AIS-CKPT-B.02.02 specification
>
>                                               so give me some more
>                                 time to review it completely , so that we
>                                 can  have consolidated patch V3
>
>                                 -AVM
>
>                                     Best regards,
>                                     Nhat Pham
>
>                                     -----Original Message-----
>                                     From: A V Mahesh
>                                     [mailto:[email protected]]
>                                     Sent: Friday, February 12, 2016
>                                     11:10 AM
>                                     To: Nhat Pham
>                                     <[email protected]>
>                                     <mailto:[email protected]>;
>                                     [email protected]
>                                     <mailto:[email protected]>
>                                     Cc:
>                                     [email protected] 
> <mailto:[email protected]>;
>                                     Beatriz Brandao
>                                     <[email protected]>
>                                     <mailto:[email protected]>
>                                     Subject: Re: [PATCH 0 of 1] Review
>                                     Request for cpsv: Support
>                                     preserving and recovering
>                                     checkpoint replicas during
>                                     headless state V2
>                                     [#1621]
>
>
>                                     Comment 2 :
>
>                                     After incorporating the comment
>                                     one all the Limitations should be
>                                     prevented based on Hydra
>                                     configuration is enabled in IMM
>                                     status.
>
>                                     Foe example :  if some application
>                                     is trying to create
>
>                                     non-collocated checkpoint active
>                                     replica getting generated/locating on
>                                     SC then ,regardless of the heads
>                                     (SC`s) status exist not exist should
>                                     return SA_AIS_ERR_NOT_SUPPORTED
>
>                                     In other words, rather that
>                                     allowing to created non-collocated
>                                     checkpoint when
>                                     heads(SC`s)  are exit , and
>                                     non-collocated checkpoint getting
>                                     unrecoverable after heads(SC`s)
>                                     rejoins.
>
>                                     
> ======================================================================
>
>                                     =======================
>
>                                             Limitation: The CKPT
>                                         service doesn't support
>                                         recovering checkpoints in
>                                             following cases:
>                                             . The checkpoint which is
>                                         unlinked before headless.
>                                             . The non-collocated
>                                         checkpoint has active replica
>                                         locating on SC.
>                                             . The non-collocated
>                                         checkpoint has active replica
>                                         locating on a PL
>                                         and this PL
>                                             restarts during headless
>                                         state. In this cases, the
>                                         checkpoint replica is
>                                             destroyed. The fault code
>                                         SA_AIS_ERR_BAD_HANDLE is
>                                         returned when the
>                                         client
>                                             accesses the checkpoint in
>                                         these cases. The client must
>                                         re-open the
>                                             checkpoint.
>
>                                     
> ======================================================================
>
>                                     =======================
>
>                                     -AVM
>
>
>                                     On 2/11/2016 12:52 PM, A V Mahesh
>                                     wrote:
>
>                                         Hi,
>
>                                         I jut starred reviewing patch
>                                         , I will be  giving comments
>                                         as soon as
>                                         I crossover any , to save some
>                                         time.
>
>                                         Comment 1 :
>                                         This functionality should be
>                                         under  checks if Hydra
>                                         configuration is
>                                         enabled in IMM attrName =
>                                         
> const_cast<SaImmAttrNameT>("scAbsenceAllowed")
>
>
>                                         Please see example how
>                                         LOG/AMF  services implemented it.
>
>                                         -AVM
>
>
>                                         On 1/29/2016 1:02 PM, Nhat
>                                         Pham wrote:
>
>                                             Hi Mahesh,
>
>                                             As described in the
>                                             README, the CKPT service
>                                             returns
>                                             SA_AIS_ERR_TRY_AGAIN fault
>                                             code in this case.
>                                             I guess it's same for
>                                             other services.
>
>                                             @Anders: Could you please
>                                             confirm this?
>
>                                             Best regards,
>                                             Nhat Pham
>
>                                             -----Original Message-----
>                                             From: A V Mahesh
>                                             [mailto:[email protected]]
>
>                                             Sent: Friday, January 29,
>                                             2016 2:11 PM
>                                             To: Nhat Pham
>                                             <[email protected]>
>                                             <mailto:[email protected]>;
>                                             [email protected]
>                                             
> <mailto:[email protected]>
>
>                                             Cc:
>                                             
> [email protected]
>                                             
> <mailto:[email protected]>
>
>                                             Subject: Re: [PATCH 0 of
>                                             1] Review Request for
>                                             cpsv: Support
>                                             preserving and recovering
>                                             checkpoint replicas during
>                                             headless state
>                                             V2 [#1621]
>
>                                             Hi,
>
>                                             On 1/29/2016 11:45 AM,
>                                             Nhat Pham wrote:
>
>                                                 -  The behavior of
>                                                 application will be
>                                                 consistent with other
>                                                 saf services like
>                                                 imm/amf behavior 
>                                                 during headless state.
>                                                 [Nhat] I'm not clear
>                                                 what you mean about
>                                                 "consistent"?
>
>                                             In the obscene of 
>                                             Director (SC's) , what is
>                                             expected return values
>                                             of SAF API should ( all
>                                             services ) ,
>                                                  which are not in
>                                             aposition to  provide
>                                             service at that moment.
>
>                                             I think all services
>                                             should return same  SAF
>                                             ERRS., I thinks
>                                             currently we don't have 
>                                             it , may be  Anders Widel 
>                                             will help us.
>
>                                             -AVM
>
>
>                                             On 1/29/2016 11:45 AM,
>                                             Nhat Pham wrote:
>
>                                                 Hi Mahesh,
>
>                                                 Please see the
>                                                 attachment for the
>                                                 README. Let me know if
>                                                 there is
>                                                 any more information
>                                                 required.
>
>                                                 Regarding your comments:
>                                                       -  during
>                                                 headless state 
>                                                 applications may
>                                                 behave like during
>                                                 CPND restart case
>                                                 [Nhat] Headless state
>                                                 and CPND restart are
>                                                 different events.
>                                                 Thus, the behavior is
>                                                 different.
>                                                 Headless state is a
>                                                 case where both SCs go
>                                                 down.
>
>                                                       -  The behavior
>                                                 of application will be
>                                                 consistent with other
>                                                 saf services like
>                                                 imm/amf behavior 
>                                                 during headless state.
>                                                 [Nhat] I'm not clear
>                                                 what you mean about
>                                                 "consistent"?
>
>                                                 Best regards,
>                                                 Nhat Pham
>
>                                                 -----Original
>                                                 Message-----
>                                                 From: A V Mahesh
>                                                 
> [mailto:[email protected]]
>
>                                                 Sent: Friday, January
>                                                 29, 2016 11:12 AM
>                                                 To: Nhat Pham
>                                                 <[email protected]>
>                                                 
> <mailto:[email protected]>;
>
>                                                 [email protected]
>                                                 
> <mailto:[email protected]>
>
>                                                 Cc:
>                                                 
> [email protected]
>                                                 
> <mailto:[email protected]>
>
>                                                 Subject: Re: [PATCH 0
>                                                 of 1] Review Request
>                                                 for cpsv: Support
>                                                 preserving and
>                                                 recovering checkpoint
>                                                 replicas during
>                                                 headless state
>                                                 V2 [#1621]
>
>                                                 Hi Nhat Pham,
>
>                                                 I stared reviewing
>                                                 this patch , so can
>                                                 please provide  README
>                                                 file
>                                                 with scope and
>                                                 limitations , that
>                                                 will help to define
>                                                 testing/reviewing scope .
>
>                                                 Following are minimum
>                                                 things we can keep in
>                                                 mind while
>                                                 reviewing/accepting
>                                                 patch ,
>
>                                                 - Not effecting
>                                                 existing functionality
>                                                       -  during
>                                                 headless state 
>                                                 applications may
>                                                 behave like during
>                                                 CPND restart case
>                                                       -  The minimum
>                                                 functionally of
>                                                 application works
>                                                       -  The behavior
>                                                 of application will be
>                                                 consistent with
>                                                          other saf
>                                                 services like imm/amf
>                                                 behavior  during
>                                                 headless state.
>
>                                                 So please do provide
>                                                 any additional
>                                                 detailed in README if
>                                                 any of
>                                                 the above is deviated
>                                                 , that allow users to
>                                                 know about the
>                                                 limitations/deviation.
>
>                                                 -AVM
>
>                                                 On 1/4/2016 3:15 PM,
>                                                 Nhat Pham wrote:
>
>                                                     Summary: cpsv:
>                                                     Support preserving
>                                                     and recovering
>                                                     checkpoint
>                                                     replicas during
>                                                     headless state
>                                                     [#1621] Review
>                                                     request for Trac
>                                                     Ticket(s):
>                                                     #1621 Peer
>                                                     Reviewer(s):
>                                                     [email protected]
>                                                     
> <mailto:[email protected]>;
>
>                                                     [email protected]
>                                                     
> <mailto:[email protected]>
>                                                     Pull request to:
>                                                     [email protected]
>                                                     
> <mailto:[email protected]>
>                                                     Affected
>                                                     branch(es):
>                                                     default Development
>                                                     branch: default
>
>                                                     
> --------------------------------
>
>                                                     Impacted area
>                                                     Impact y/n
>                                                     
> --------------------------------
>
>                                                     Docs n
>                                                           Build
>                                                     system            n
>                                                     RPM/packaging n
>                                                          
>                                                     Configuration
>                                                     files     n
>                                                           Startup
>                                                     scripts         n
>                                                           SAF
>                                                     services            y
>                                                           OpenSAF
>                                                     services        n
>                                                           Core
>                                                     libraries          n
>                                                     Samples n
>                                                     Tests n
>                                                     Other n
>
>
>                                                     Comments (indicate
>                                                     scope for each "y"
>                                                     above):
>                                                     
> ---------------------------------------------
>
>
>                                                     changeset
>                                                     
> faec4a4445a4c23e8f630857b19aabb43b5af18d
>
>                                                     Author:    Nhat
>                                                     Pham
>                                                     <[email protected]>
>                                                     
> <mailto:[email protected]>
>
>                                                     Date:    Mon, 04
>                                                     Jan 2016 16:34:33
>                                                     +0700
>
>                                                           cpsv:
>                                                     Support preserving
>                                                     and recovering
>                                                     checkpoint replicas
>                                                     during headless
>                                                     state [#1621]
>
>                                                           Background:
>                                                           ----------
>                                                     This enhancement
>                                                     supports to
>                                                     preserve checkpoint
>                                                     replicas
>
>                                                 in case
>
>                                                     both SCs down
>                                                     (headless state)
>                                                     and recover
>                                                     replicas in case
>                                                     one of
>
>                                                 SCs up
>
>                                                     again. If both SCs
>                                                     goes down,
>                                                     checkpoint
>                                                     replicas on
>                                                     surviving nodes
>
>                                                 still
>
>                                                     remain. When a SC
>                                                     is available
>                                                     again, surviving
>                                                     replicas are
>
>                                                 automatically
>
>                                                     registered to the
>                                                     SC checkpoint
>                                                     database. Content in
>                                                     surviving
>
>                                                 replicas are
>
>                                                     intacted and
>                                                     synchronized to
>                                                     new replicas.
>
>                                                           When no SC
>                                                     is available,
>                                                     client API calls
>                                                     changing checkpoint
>
>                                                 configuration
>
>                                                     which requires SC
>                                                     communication, are
>                                                     rejected. Client API
>                                                     calls
>
>                                                 reading and
>
>                                                     writing existing
>                                                     checkpoint
>                                                     replicas still work.
>
>                                                           Limitation:
>                                                     The CKPT service
>                                                     does not support
>                                                     recovering
>                                                     checkpoints
>
>                                                 in
>
>                                                     following cases:
>                                                            - The
>                                                     checkpoint which
>                                                     is unlinked before
>                                                     headless.
>                                                            - The
>                                                     non-collocated
>                                                     checkpoint has
>                                                     active replica
>                                                     locating
>                                                     on SC.
>                                                            - The
>                                                     non-collocated
>                                                     checkpoint has
>                                                     active replica
>                                                     locating
>                                                     on a PL
>
>                                                 and this
>
>                                                     PL restarts during
>                                                     headless state. In
>                                                     this cases, the
>                                                     checkpoint
>
>                                                 replica is
>
>                                                     destroyed. The
>                                                     fault code
>                                                     SA_AIS_ERR_BAD_HANDLE
>                                                     is returned
>                                                     when the
>
>                                                 client
>
>                                                     accesses the
>                                                     checkpoint in
>                                                     these cases. The
>                                                     client must
>                                                     re-open the
>                                                           checkpoint.
>
>                                                           While in
>                                                     headless state,
>                                                     accessing
>                                                     checkpoint
>                                                     replicas does
>                                                     not work
>
>                                                 if the
>
>                                                     node which hosts
>                                                     the active replica
>                                                     goes down. It will
>                                                     back
>                                                     working
>
>                                                 when a
>
>                                                     SC available again.
>
>                                                           Solution:
>                                                           ---------
>                                                     The solution for
>                                                     this enhancement
>                                                     includes 2 parts:
>
>                                                           1. To
>                                                     destroy
>                                                     un-recoverable
>                                                     checkpoint
>                                                     described above when
>                                                     both
>
>                                                 SCs are
>
>                                                     down: When both
>                                                     SCs are down, the
>                                                     CPND deletes
>                                                     un-recoverable
>
>                                                 checkpoint
>
>                                                     nodes and replicas
>                                                     on PLs. Then it
>                                                     requests CPA to
>                                                     destroy
>
>                                                 corresponding
>
>                                                     checkpoint node by
>                                                     using new message
>                                                     CPA_EVT_ND2A_CKPT_DESTROY
>
>
>                                                           2. To update
>                                                     CPD with
>                                                     checkpoint
>                                                     information When
>                                                     an active
>                                                     SC is up
>
>                                                 after
>
>                                                     headless, CPND
>                                                     will update CPD
>                                                     with checkpoint
>                                                     information by
>                                                     using
>
>                                                 new
>
>                                                     message
>                                                     
> CPD_EVT_ND2D_CKPT_INFO_UPDATE
>                                                     instead of using
>                                                     CPD_EVT_ND2D_CKPT_CREATE.
>                                                     This is because
>                                                     the CPND will
>                                                     create new
>
>                                                 ckpt_id
>
>                                                     for the checkpoint
>                                                     which might be
>                                                     different with the
>                                                     current
>                                                     ckpt id
>
>                                                 if the
>
>                                                     CPD_EVT_ND2D_CKPT_CREATE
>                                                     is used. The CPD
>                                                     collects checkpoint
>
>                                                 information
>
>                                                     within 6s. During
>                                                     this updating
>                                                     time, following
>                                                     requests is
>                                                     rejected
>
>                                                 with
>
>                                                     fault code
>                                                     SA_AIS_ERR_TRY_AGAIN:
>                                                           -
>                                                     CPD_EVT_ND2D_CKPT_CREATE
>
>                                                           -
>                                                     CPD_EVT_ND2D_CKPT_UNLINK
>
>                                                           -
>                                                     CPD_EVT_ND2D_ACTIVE_SET
>
>                                                           -
>                                                     CPD_EVT_ND2D_CKPT_RDSET
>
>
>
>                                                     Complete diffstat:
>                                                     ------------------
>                                                     
> osaf/libs/agents/saf/cpa/cpa_proc.c
>                                                     |   52
>
>                                                 
> +++++++++++++++++++++++++++++++++++
>
>
>                                                     
> osaf/libs/common/cpsv/cpsv_edu.c
>                                                     |   43
>
>                                                 +++++++++++++++++++++++++++++
>
>
>                                                     
> osaf/libs/common/cpsv/include/cpd_cb.h
>                                                     |    3 ++
>                                                     
> osaf/libs/common/cpsv/include/cpd_imm.h
>                                                     |    1 +
>                                                     
> osaf/libs/common/cpsv/include/cpd_proc.h
>                                                     |    7 ++++
>                                                     
> osaf/libs/common/cpsv/include/cpd_tmr.h
>                                                     |    3 +-
>                                                     
> osaf/libs/common/cpsv/include/cpnd_cb.h
>                                                     |    1 +
>                                                     
> osaf/libs/common/cpsv/include/cpnd_init.h
>                                                     |    2 +
>                                                     
> osaf/libs/common/cpsv/include/cpsv_evt.h
>                                                     |   20 +++++++++++++
>                                                     
> osaf/services/saf/cpsv/cpd/Makefile.am
>                                                     |    3 +-
>                                                     
> osaf/services/saf/cpsv/cpd/cpd_evt.c
>                                                     |  229
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                 ++++
>
>                                                     
> osaf/services/saf/cpsv/cpd/cpd_imm.c
>                                                     |  112
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                     
> osaf/services/saf/cpsv/cpd/cpd_init.c
>                                                     |   20 ++++++++++++-
>                                                     
> osaf/services/saf/cpsv/cpd/cpd_proc.c
>                                                     |  309
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                     
> osaf/services/saf/cpsv/cpd/cpd_tmr.c
>                                                     |    7 ++++
>                                                     
> osaf/services/saf/cpsv/cpnd/cpnd_db.c
>                                                     |   16 ++++++++++
>                                                     
> osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>                                                     |   22
>                                                     +++++++++++++++
>                                                     
> osaf/services/saf/cpsv/cpnd/cpnd_init.c
>                                                     |   23
>                                                     ++++++++++++++-
>                                                     
> osaf/services/saf/cpsv/cpnd/cpnd_mds.c
>                                                     |   13 ++++++++
>                                                     
> osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>                                                     |  314
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>                                                 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>
>
>                                                     20 files changed,
>                                                     1189
>                                                     insertions(+), 11
>                                                     deletions(-)
>
>
>                                                     Testing Commands:
>                                                     -----------------
>                                                     -
>
>                                                     Testing, Expected
>                                                     Results:
>                                                     --------------------------
>
>                                                     -
>
>
>                                                     Conditions of
>                                                     Submission:
>                                                     -------------------------
>
>                                                           <<HOW MANY
>                                                     DAYS BEFORE
>                                                     PUSHING, CONSENSUS
>                                                     ETC>>
>
>
>                                                     Arch      Built
>                                                     Started    Linux
>                                                     distro
>                                                     
> -------------------------------------------
>
>                                                     mips       
>                                                     n          n
>                                                     mips64     
>                                                     n          n
>                                                     x86        
>                                                     n          n
>                                                     x86_64     
>                                                     n          n
>                                                     powerpc    
>                                                     n          n
>                                                     powerpc64  
>                                                     n          n
>
>
>                                                     Reviewer Checklist:
>                                                     -------------------
>                                                     [Submitters: make
>                                                     sure that your
>                                                     review doesn't
>                                                     trigger any
>                                                     checkmarks!]
>
>
>                                                     Your checkin has
>                                                     not passed review
>                                                     because (see
>                                                     checked entries):
>
>                                                     ___ Your RR
>                                                     template is
>                                                     generally
>                                                     incomplete; it has
>                                                     too many
>                                                     blank
>
>                                                 entries
>
>                                                     that need proper
>                                                     data filled in.
>
>                                                     ___ You have
>                                                     failed to nominate
>                                                     the proper persons
>                                                     for review and
>                                                     push.
>
>                                                     ___ Your patches
>                                                     do not have proper
>                                                     short+long header
>
>                                                     ___ You have
>                                                     grammar/spelling
>                                                     in your header
>                                                     that is unacceptable.
>
>                                                     ___ You have
>                                                     exceeded a
>                                                     sensible line
>                                                     length in your
>
>                                                 headers/comments/text.
>
>                                                     ___ You have
>                                                     failed to put in a
>                                                     proper Trac Ticket
>                                                     # into your
>                                                     commits.
>
>                                                     ___ You have
>                                                     incorrectly
>                                                     put/left internal
>                                                     data in your
>                                                     comments/files
>                                                              (i.e.
>                                                     internal bug
>                                                     tracking tool IDs,
>                                                     product names etc)
>
>                                                     ___ You have not
>                                                     given any evidence
>                                                     of testing beyond
>                                                     basic build
>                                                     tests.
>                                                             
>                                                     Demonstrate some
>                                                     level of runtime
>                                                     or other sanity
>                                                     testing.
>
>                                                     ___ You have ^M
>                                                     present in some of
>                                                     your files. These
>                                                     have to be
>                                                     removed.
>
>                                                     ___ You have
>                                                     needlessly changed
>                                                     whitespace or
>                                                     added whitespace
>                                                     crimes
>                                                              like
>                                                     trailing spaces,
>                                                     or spaces before
>                                                     tabs.
>
>                                                     ___ You have mixed
>                                                     real technical
>                                                     changes with
>                                                     whitespace and other
>                                                              cosmetic
>                                                     code cleanup
>                                                     changes. These
>                                                     have to be separate
>                                                     commits.
>
>                                                     ___ You need to
>                                                     refactor your
>                                                     submission into
>                                                     logical chunks;
>                                                     there is
>                                                              too much
>                                                     content into a
>                                                     single commit.
>
>                                                     ___ You have
>                                                     extraneous garbage
>                                                     in your review
>                                                     (merge commits etc)
>
>                                                     ___ You have giant
>                                                     attachments which
>                                                     should never have
>                                                     been sent;
>                                                              Instead
>                                                     you should place
>                                                     your content in a
>                                                     public tree to
>                                                     be pulled.
>
>                                                     ___ You have too
>                                                     many commits
>                                                     attached to an
>                                                     e-mail; resend as
>                                                     threaded
>                                                              commits,
>                                                     or place in a
>                                                     public tree for a
>                                                     pull.
>
>                                                     ___ You have
>                                                     resent this
>                                                     content multiple
>                                                     times without a clear
>                                                     indication
>                                                              of what
>                                                     has changed
>                                                     between each re-send.
>
>                                                     ___ You have
>                                                     failed to
>                                                     adequately and
>                                                     individually
>                                                     address all of the
>                                                              comments
>                                                     and change
>                                                     requests that were
>                                                     proposed in the
>                                                     initial
>
>                                                 review.
>
>                                                     ___ You have a
>                                                     misconfigured
>                                                     ~/.hgrc file (i.e.
>                                                     username, email
>                                                     etc)
>
>                                                     ___ Your computer
>                                                     have a badly
>                                                     configured date
>                                                     and time;
>                                                     confusing the
>                                                              the
>                                                     threaded patch
>                                                     review.
>
>                                                     ___ Your changes
>                                                     affect IPC
>                                                     mechanism, and you
>                                                     don't present any
>                                                     results
>                                                              for
>                                                     in-service
>                                                     upgradability test.
>
>                                                     ___ Your changes
>                                                     affect user manual
>                                                     and documentation,
>                                                     your patch
>                                                     series
>                                                              do not
>                                                     contain the patch
>                                                     that updates the
>                                                     Doxygen manual.
>

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0 of 1] Review Request for cpsv: Support preserving and recovering checkpoint replicas during headless state V2 [#1621]

Reply via email to