Hi Mahesh,

For the comment 1, the patch will be updated accordingly.

For the comment 2, I think the CKPT service will not be backward compatible if 
the scAbsenceAllowed is true.
The client can't create non-collocated checkpoint on SCs.

Furthermore, this solution only protects the CKPT service from the case "The 
non-collocated checkpoint  is created on a SC"
there are still the cases where the replicas are completely lost. Ex:

- The non-collocated checkpoint created on a PL. The PL reboots. Both replicas 
now locate on SCs. Then, headless state happens. All replicas are lost.
- The non-collocated checkpoint has active replica locating on a PL and this 
PL restarts during headless state
- The non-collocated checkpoint is created on PL3. This checkpoint is also 
opened on PL4. Then SCs and PL3 reboot.
In this case, all replicas are lost and the client has to create it again.

In case multiple nodes (which including SCs) reboot, losing replicas is 
unpreventable. The patch is to recover the checkpoints in possible cases.
How do you think?

Best regards,
Nhat Pham

-----Original Message-----
From: A V Mahesh [mailto:[email protected]]
Sent: Friday, February 12, 2016 11:10 AM
To: Nhat Pham <[email protected]>; [email protected]
Cc: [email protected]; Beatriz Brandao 
<[email protected]>
Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support preserving and 
recovering checkpoint replicas during headless state V2 [#1621]


Comment 2 :

After incorporating the comment one all the Limitations should be prevented 
based on Hydra configuration is enabled in IMM status.

Foe example :  if some application is trying to create

non-collocated checkpoint active replica getting generated/locating on SC then 
,regardless of the heads (SC`s) status exist not exist should return 
SA_AIS_ERR_NOT_SUPPORTED

In other words, rather that allowing to created non-collocated checkpoint when
heads(SC`s)  are exit , and non-collocated checkpoint getting unrecoverable 
after heads(SC`s) rejoins.

=============================================================================================
>       Limitation: The CKPT service doesn't support recovering checkpoints in
>       following cases:
>       . The checkpoint which is unlinked before headless.
>       . The non-collocated checkpoint has active replica locating on SC.
>       . The non-collocated checkpoint has active replica locating on a PL and 
> this PL
>       restarts during headless state. In this cases, the checkpoint replica is
>       destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned when the 
> client
>       accesses the checkpoint in these cases. The client must re-open the
>       checkpoint.
=============================================================================================

-AVM


On 2/11/2016 12:52 PM, A V Mahesh wrote:
> Hi,
>
> I jut starred reviewing patch , I will be  giving comments as soon as
> I crossover any , to save some time.
>
> Comment 1 :
> This functionality should be under  checks if Hydra configuration is
> enabled in IMM attrName =
> const_cast<SaImmAttrNameT>("scAbsenceAllowed")
>
> Please see example how  LOG/AMF  services implemented it.
>
> -AVM
>
>
> On 1/29/2016 1:02 PM, Nhat Pham wrote:
>> Hi Mahesh,
>>
>> As described in the README, the CKPT service returns
>> SA_AIS_ERR_TRY_AGAIN fault code in this case.
>> I guess it's same for other services.
>>
>> @Anders: Could you please confirm this?
>>
>> Best regards,
>> Nhat Pham
>>
>> -----Original Message-----
>> From: A V Mahesh [mailto:[email protected]]
>> Sent: Friday, January 29, 2016 2:11 PM
>> To: Nhat Pham <[email protected]>; [email protected]
>> Cc: [email protected]
>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>> preserving and recovering checkpoint replicas during headless state
>> V2 [#1621]
>>
>> Hi,
>>
>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>     -  The behavior of application will be consistent with other saf
>>> services like imm/amf behavior  during headless state.
>>> [Nhat] I'm not clear what you mean about "consistent"?
>> In the obscene of  Director (SC's) , what is expected return values
>> of SAF API should ( all services ) ,
>>    which are not in aposition to  provide service at that moment.
>>
>> I think all services should return same  SAF ERRS., I thinks
>> currently we don't have  it , may be  Anders Widel  will help us.
>>
>> -AVM
>>
>>
>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>> Hi Mahesh,
>>>
>>> Please see the attachment for the README. Let me know if there is
>>> any more information required.
>>>
>>> Regarding your comments:
>>>     -  during headless state  applications may behave like during
>>> CPND restart case [Nhat] Headless state and CPND restart are
>>> different events. Thus, the behavior is different.
>>> Headless state is a case where both SCs go down.
>>>
>>>     -  The behavior of application will be consistent with other saf
>>> services like imm/amf behavior  during headless state.
>>> [Nhat] I'm not clear what you mean about "consistent"?
>>>
>>> Best regards,
>>> Nhat Pham
>>>
>>> -----Original Message-----
>>> From: A V Mahesh [mailto:[email protected]]
>>> Sent: Friday, January 29, 2016 11:12 AM
>>> To: Nhat Pham <[email protected]>; [email protected]
>>> Cc: [email protected]
>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>> preserving and recovering checkpoint replicas during headless state
>>> V2 [#1621]
>>>
>>> Hi Nhat Pham,
>>>
>>> I stared reviewing this patch , so can please provide  README file
>>> with scope and limitations , that will help to define
>>> testing/reviewing scope .
>>>
>>> Following are minimum things we can keep in mind while
>>> reviewing/accepting patch ,
>>>
>>> - Not effecting existing functionality
>>>     -  during headless state  applications may behave like during
>>> CPND restart case
>>>     -  The minimum functionally of application works
>>>     -  The behavior of application will be consistent with
>>>        other saf services like imm/amf behavior  during headless state.
>>>
>>> So please do provide any additional detailed in README if any of the
>>> above is deviated , that allow users to know about the
>>> limitations/deviation.
>>>
>>> -AVM
>>>
>>> On 1/4/2016 3:15 PM, Nhat Pham wrote:
>>>> Summary: cpsv: Support preserving and recovering checkpoint
>>>> replicas during headless state [#1621] Review request for Trac Ticket(s):
>>>> #1621 Peer Reviewer(s): [email protected];
>>>> [email protected] Pull request to: [email protected]
>>>> Affected branch(es): default Development branch: default
>>>>
>>>> --------------------------------
>>>> Impacted area       Impact y/n
>>>> --------------------------------
>>>>     Docs                    n
>>>>     Build system            n
>>>>     RPM/packaging           n
>>>>     Configuration files     n
>>>>     Startup scripts         n
>>>>     SAF services            y
>>>>     OpenSAF services        n
>>>>     Core libraries          n
>>>>     Samples                 n
>>>>     Tests                   n
>>>>     Other                   n
>>>>
>>>>
>>>> Comments (indicate scope for each "y" above):
>>>> ---------------------------------------------
>>>>
>>>> changeset faec4a4445a4c23e8f630857b19aabb43b5af18d
>>>> Author:    Nhat Pham <[email protected]>
>>>> Date:    Mon, 04 Jan 2016 16:34:33 +0700
>>>>
>>>>     cpsv: Support preserving and recovering checkpoint replicas
>>>> during headless state [#1621]
>>>>
>>>>     Background:
>>>>     ---------- This enhancement supports to preserve checkpoint
>>>> replicas
>>> in case
>>>>     both SCs down (headless state) and recover replicas in case one
>>>> of
>>> SCs up
>>>>     again. If both SCs goes down, checkpoint replicas on surviving
>>>> nodes
>>> still
>>>>     remain. When a SC is available again, surviving replicas are
>>> automatically
>>>>     registered to the SC checkpoint database. Content in surviving
>>> replicas are
>>>>     intacted and synchronized to new replicas.
>>>>
>>>>     When no SC is available, client API calls changing checkpoint
>>> configuration
>>>>     which requires SC communication, are rejected. Client API calls
>>> reading and
>>>>     writing existing checkpoint replicas still work.
>>>>
>>>>     Limitation: The CKPT service does not support recovering
>>>> checkpoints
>>> in
>>>>     following cases:
>>>>      - The checkpoint which is unlinked before headless.
>>>>      - The non-collocated checkpoint has active replica locating on
>>>> SC.
>>>>      - The non-collocated checkpoint has active replica locating on
>>>> a PL
>>> and this
>>>>     PL restarts during headless state. In this cases, the
>>>> checkpoint
>>> replica is
>>>>     destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned
>>>> when the
>>> client
>>>>     accesses the checkpoint in these cases. The client must re-open
>>>> the
>>>>     checkpoint.
>>>>
>>>>     While in headless state, accessing checkpoint replicas does not
>>>> work
>>> if the
>>>>     node which hosts the active replica goes down. It will back
>>>> working
>>> when a
>>>>     SC available again.
>>>>
>>>>     Solution:
>>>>     --------- The solution for this enhancement includes 2 parts:
>>>>
>>>>     1. To destroy un-recoverable checkpoint described above when
>>>> both
>>> SCs are
>>>>     down: When both SCs are down, the CPND deletes un-recoverable
>>> checkpoint
>>>>     nodes and replicas on PLs. Then it requests CPA to destroy
>>> corresponding
>>>>     checkpoint node by using new message CPA_EVT_ND2A_CKPT_DESTROY
>>>>
>>>>     2. To update CPD with checkpoint information When an active SC
>>>> is up
>>> after
>>>>     headless, CPND will update CPD with checkpoint information by
>>>> using
>>> new
>>>>     message CPD_EVT_ND2D_CKPT_INFO_UPDATE instead of using
>>>>     CPD_EVT_ND2D_CKPT_CREATE. This is because the CPND will create
>>>> new
>>> ckpt_id
>>>>     for the checkpoint which might be different with the current
>>>> ckpt id
>>> if the
>>>>     CPD_EVT_ND2D_CKPT_CREATE is used. The CPD collects checkpoint
>>> information
>>>>     within 6s. During this updating time, following requests is
>>>> rejected
>>> with
>>>>     fault code SA_AIS_ERR_TRY_AGAIN:
>>>>     - CPD_EVT_ND2D_CKPT_CREATE
>>>>     - CPD_EVT_ND2D_CKPT_UNLINK
>>>>     - CPD_EVT_ND2D_ACTIVE_SET
>>>>     - CPD_EVT_ND2D_CKPT_RDSET
>>>>
>>>>
>>>> Complete diffstat:
>>>> ------------------
>>>>     osaf/libs/agents/saf/cpa/cpa_proc.c       |   52
>>> +++++++++++++++++++++++++++++++++++
>>>> osaf/libs/common/cpsv/cpsv_edu.c          |   43
>>> +++++++++++++++++++++++++++++
>>>> osaf/libs/common/cpsv/include/cpd_cb.h    |    3 ++
>>>>     osaf/libs/common/cpsv/include/cpd_imm.h   |    1 +
>>>>     osaf/libs/common/cpsv/include/cpd_proc.h  |    7 ++++
>>>>     osaf/libs/common/cpsv/include/cpd_tmr.h   |    3 +-
>>>>     osaf/libs/common/cpsv/include/cpnd_cb.h   |    1 +
>>>>     osaf/libs/common/cpsv/include/cpnd_init.h |    2 +
>>>>     osaf/libs/common/cpsv/include/cpsv_evt.h  |   20 +++++++++++++
>>>>     osaf/services/saf/cpsv/cpd/Makefile.am    |    3 +-
>>>>     osaf/services/saf/cpsv/cpd/cpd_evt.c      |  229
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> ++++
>>>> osaf/services/saf/cpsv/cpd/cpd_imm.c      |  112
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>> osaf/services/saf/cpsv/cpd/cpd_init.c     |   20 ++++++++++++-
>>>>     osaf/services/saf/cpsv/cpd/cpd_proc.c     |  309
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> osaf/services/saf/cpsv/cpd/cpd_tmr.c      |    7 ++++
>>>>     osaf/services/saf/cpsv/cpnd/cpnd_db.c     |   16 ++++++++++
>>>>     osaf/services/saf/cpsv/cpnd/cpnd_evt.c    |   22 +++++++++++++++
>>>>     osaf/services/saf/cpsv/cpnd/cpnd_init.c   |   23 ++++++++++++++-
>>>>     osaf/services/saf/cpsv/cpnd/cpnd_mds.c    |   13 ++++++++
>>>>     osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  314
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>     20 files changed, 1189 insertions(+), 11 deletions(-)
>>>>
>>>>
>>>> Testing Commands:
>>>> -----------------
>>>> -
>>>>
>>>> Testing, Expected Results:
>>>> --------------------------
>>>> -
>>>>
>>>>
>>>> Conditions of Submission:
>>>> -------------------------
>>>>     <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
>>>>
>>>>
>>>> Arch      Built     Started    Linux distro
>>>> -------------------------------------------
>>>> mips        n          n
>>>> mips64      n          n
>>>> x86         n          n
>>>> x86_64      n          n
>>>> powerpc     n          n
>>>> powerpc64   n          n
>>>>
>>>>
>>>> Reviewer Checklist:
>>>> -------------------
>>>> [Submitters: make sure that your review doesn't trigger any
>>>> checkmarks!]
>>>>
>>>>
>>>> Your checkin has not passed review because (see checked entries):
>>>>
>>>> ___ Your RR template is generally incomplete; it has too many blank
>>> entries
>>>>        that need proper data filled in.
>>>>
>>>> ___ You have failed to nominate the proper persons for review and
>>>> push.
>>>>
>>>> ___ Your patches do not have proper short+long header
>>>>
>>>> ___ You have grammar/spelling in your header that is unacceptable.
>>>>
>>>> ___ You have exceeded a sensible line length in your
>>> headers/comments/text.
>>>> ___ You have failed to put in a proper Trac Ticket # into your
>>>> commits.
>>>>
>>>> ___ You have incorrectly put/left internal data in your comments/files
>>>>        (i.e. internal bug tracking tool IDs, product names etc)
>>>>
>>>> ___ You have not given any evidence of testing beyond basic build
>>>> tests.
>>>>        Demonstrate some level of runtime or other sanity testing.
>>>>
>>>> ___ You have ^M present in some of your files. These have to be
>>>> removed.
>>>>
>>>> ___ You have needlessly changed whitespace or added whitespace crimes
>>>>        like trailing spaces, or spaces before tabs.
>>>>
>>>> ___ You have mixed real technical changes with whitespace and other
>>>>        cosmetic code cleanup changes. These have to be separate
>>>> commits.
>>>>
>>>> ___ You need to refactor your submission into logical chunks; there is
>>>>        too much content into a single commit.
>>>>
>>>> ___ You have extraneous garbage in your review (merge commits etc)
>>>>
>>>> ___ You have giant attachments which should never have been sent;
>>>>        Instead you should place your content in a public tree to be
>>>> pulled.
>>>>
>>>> ___ You have too many commits attached to an e-mail; resend as
>>>> threaded
>>>>        commits, or place in a public tree for a pull.
>>>>
>>>> ___ You have resent this content multiple times without a clear
>>>> indication
>>>>        of what has changed between each re-send.
>>>>
>>>> ___ You have failed to adequately and individually address all of the
>>>>        comments and change requests that were proposed in the
>>>> initial
>>> review.
>>>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email
>>>> etc)
>>>>
>>>> ___ Your computer have a badly configured date and time; confusing the
>>>>        the threaded patch review.
>>>>
>>>> ___ Your changes affect IPC mechanism, and you don't present any
>>>> results
>>>>        for in-service upgradability test.
>>>>
>>>> ___ Your changes affect user manual and documentation, your patch
>>>> series
>>>>        do not contain the patch that updates the Doxygen manual.
>>>>
>>
>



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to