Hi Nhat Pham,
How is your holiday went
Please find my comments below
On 2/15/2016 8:43 AM, Nhat Pham wrote:
> Hi Mahesh,
>
> For the comment 1, the patch will be updated accordingly.
[AVM] Please hold , I will provide more comments in this week , so we
can have consolidated V3
>
> For the comment 2, I think the CKPT service will not be backward compatible if
> the scAbsenceAllowed is true.
> The client can't create non-collocated checkpoint on SCs.
>
> Furthermore, this solution only protects the CKPT service from the case "The
> non-collocated checkpoint is created on a SC"
> there are still the cases where the replicas are completely lost. Ex:
>
> - The non-collocated checkpoint created on a PL. The PL reboots. Both replicas
> now locate on SCs. Then, headless state happens. All replicas are lost.
> - The non-collocated checkpoint has active replica locating on a PL and this
> PL restarts during headless state
> - The non-collocated checkpoint is created on PL3. This checkpoint is also
> opened on PL4. Then SCs and PL3 reboot.
[AVM] Up on rejoining of the SC`s The replica should be re-created
regardless of another application opens it on PL4.
( Note : this comment is based on your explanation have
not yet reviewed/tested ,
currently i am struggling with SC`s not rejoining
after headless state , i can provide you more on this once i complte my
review/testing)
> In this case, all replicas are lost and the client has to create it again.
>
> In case multiple nodes (which including SCs) reboot, losing replicas is
> unpreventable. The patch is to recover the checkpoints in possible cases.
> How do you think?
[AVM] I understand that , before I comment more on this please allow
me to understand
I am not still not very clear of the headless design in detail.
For example cluster membership of PL`s during headless
state ,
In the absence of SC`s (CLMD) dose the PLs is considered
as cluster nodes or not (cluster membership) ?
- if not consider as NON cluster nodes Checkpoint
Service API should leverage the SA Forum Cluster
Membership Service and API's can fail with
SA_AIS_ERR_UNAVAILABLE
- if considers as cluster nodes we need to follow
all the defined rules which are defined in SAI-AIS-CKPT-B.02.02
specification
so give me some more time to review it completely , so that
we can have consolidated patch V3
-AVM
>
> Best regards,
> Nhat Pham
>
> -----Original Message-----
> From: A V Mahesh [mailto:[email protected]]
> Sent: Friday, February 12, 2016 11:10 AM
> To: Nhat Pham <[email protected]>; [email protected]
> Cc: [email protected]; Beatriz Brandao
> <[email protected]>
> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support preserving and
> recovering checkpoint replicas during headless state V2 [#1621]
>
>
> Comment 2 :
>
> After incorporating the comment one all the Limitations should be prevented
> based on Hydra configuration is enabled in IMM status.
>
> Foe example : if some application is trying to create
>
> non-collocated checkpoint active replica getting generated/locating on SC then
> ,regardless of the heads (SC`s) status exist not exist should return
> SA_AIS_ERR_NOT_SUPPORTED
>
> In other words, rather that allowing to created non-collocated checkpoint when
> heads(SC`s) are exit , and non-collocated checkpoint getting unrecoverable
> after heads(SC`s) rejoins.
>
> =============================================================================================
>> Limitation: The CKPT service doesn't support recovering checkpoints in
>> following cases:
>> . The checkpoint which is unlinked before headless.
>> . The non-collocated checkpoint has active replica locating on SC.
>> . The non-collocated checkpoint has active replica locating on a PL and
>> this PL
>> restarts during headless state. In this cases, the checkpoint replica is
>> destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned when the
>> client
>> accesses the checkpoint in these cases. The client must re-open the
>> checkpoint.
> =============================================================================================
>
> -AVM
>
>
> On 2/11/2016 12:52 PM, A V Mahesh wrote:
>> Hi,
>>
>> I jut starred reviewing patch , I will be giving comments as soon as
>> I crossover any , to save some time.
>>
>> Comment 1 :
>> This functionality should be under checks if Hydra configuration is
>> enabled in IMM attrName =
>> const_cast<SaImmAttrNameT>("scAbsenceAllowed")
>>
>> Please see example how LOG/AMF services implemented it.
>>
>> -AVM
>>
>>
>> On 1/29/2016 1:02 PM, Nhat Pham wrote:
>>> Hi Mahesh,
>>>
>>> As described in the README, the CKPT service returns
>>> SA_AIS_ERR_TRY_AGAIN fault code in this case.
>>> I guess it's same for other services.
>>>
>>> @Anders: Could you please confirm this?
>>>
>>> Best regards,
>>> Nhat Pham
>>>
>>> -----Original Message-----
>>> From: A V Mahesh [mailto:[email protected]]
>>> Sent: Friday, January 29, 2016 2:11 PM
>>> To: Nhat Pham <[email protected]>; [email protected]
>>> Cc: [email protected]
>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>> preserving and recovering checkpoint replicas during headless state
>>> V2 [#1621]
>>>
>>> Hi,
>>>
>>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>> - The behavior of application will be consistent with other saf
>>>> services like imm/amf behavior during headless state.
>>>> [Nhat] I'm not clear what you mean about "consistent"?
>>> In the obscene of Director (SC's) , what is expected return values
>>> of SAF API should ( all services ) ,
>>> which are not in aposition to provide service at that moment.
>>>
>>> I think all services should return same SAF ERRS., I thinks
>>> currently we don't have it , may be Anders Widel will help us.
>>>
>>> -AVM
>>>
>>>
>>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>> Hi Mahesh,
>>>>
>>>> Please see the attachment for the README. Let me know if there is
>>>> any more information required.
>>>>
>>>> Regarding your comments:
>>>> - during headless state applications may behave like during
>>>> CPND restart case [Nhat] Headless state and CPND restart are
>>>> different events. Thus, the behavior is different.
>>>> Headless state is a case where both SCs go down.
>>>>
>>>> - The behavior of application will be consistent with other saf
>>>> services like imm/amf behavior during headless state.
>>>> [Nhat] I'm not clear what you mean about "consistent"?
>>>>
>>>> Best regards,
>>>> Nhat Pham
>>>>
>>>> -----Original Message-----
>>>> From: A V Mahesh [mailto:[email protected]]
>>>> Sent: Friday, January 29, 2016 11:12 AM
>>>> To: Nhat Pham <[email protected]>; [email protected]
>>>> Cc: [email protected]
>>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>>> preserving and recovering checkpoint replicas during headless state
>>>> V2 [#1621]
>>>>
>>>> Hi Nhat Pham,
>>>>
>>>> I stared reviewing this patch , so can please provide README file
>>>> with scope and limitations , that will help to define
>>>> testing/reviewing scope .
>>>>
>>>> Following are minimum things we can keep in mind while
>>>> reviewing/accepting patch ,
>>>>
>>>> - Not effecting existing functionality
>>>> - during headless state applications may behave like during
>>>> CPND restart case
>>>> - The minimum functionally of application works
>>>> - The behavior of application will be consistent with
>>>> other saf services like imm/amf behavior during headless state.
>>>>
>>>> So please do provide any additional detailed in README if any of the
>>>> above is deviated , that allow users to know about the
>>>> limitations/deviation.
>>>>
>>>> -AVM
>>>>
>>>> On 1/4/2016 3:15 PM, Nhat Pham wrote:
>>>>> Summary: cpsv: Support preserving and recovering checkpoint
>>>>> replicas during headless state [#1621] Review request for Trac Ticket(s):
>>>>> #1621 Peer Reviewer(s): [email protected];
>>>>> [email protected] Pull request to: [email protected]
>>>>> Affected branch(es): default Development branch: default
>>>>>
>>>>> --------------------------------
>>>>> Impacted area Impact y/n
>>>>> --------------------------------
>>>>> Docs n
>>>>> Build system n
>>>>> RPM/packaging n
>>>>> Configuration files n
>>>>> Startup scripts n
>>>>> SAF services y
>>>>> OpenSAF services n
>>>>> Core libraries n
>>>>> Samples n
>>>>> Tests n
>>>>> Other n
>>>>>
>>>>>
>>>>> Comments (indicate scope for each "y" above):
>>>>> ---------------------------------------------
>>>>>
>>>>> changeset faec4a4445a4c23e8f630857b19aabb43b5af18d
>>>>> Author: Nhat Pham <[email protected]>
>>>>> Date: Mon, 04 Jan 2016 16:34:33 +0700
>>>>>
>>>>> cpsv: Support preserving and recovering checkpoint replicas
>>>>> during headless state [#1621]
>>>>>
>>>>> Background:
>>>>> ---------- This enhancement supports to preserve checkpoint
>>>>> replicas
>>>> in case
>>>>> both SCs down (headless state) and recover replicas in case one
>>>>> of
>>>> SCs up
>>>>> again. If both SCs goes down, checkpoint replicas on surviving
>>>>> nodes
>>>> still
>>>>> remain. When a SC is available again, surviving replicas are
>>>> automatically
>>>>> registered to the SC checkpoint database. Content in surviving
>>>> replicas are
>>>>> intacted and synchronized to new replicas.
>>>>>
>>>>> When no SC is available, client API calls changing checkpoint
>>>> configuration
>>>>> which requires SC communication, are rejected. Client API calls
>>>> reading and
>>>>> writing existing checkpoint replicas still work.
>>>>>
>>>>> Limitation: The CKPT service does not support recovering
>>>>> checkpoints
>>>> in
>>>>> following cases:
>>>>> - The checkpoint which is unlinked before headless.
>>>>> - The non-collocated checkpoint has active replica locating on
>>>>> SC.
>>>>> - The non-collocated checkpoint has active replica locating on
>>>>> a PL
>>>> and this
>>>>> PL restarts during headless state. In this cases, the
>>>>> checkpoint
>>>> replica is
>>>>> destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned
>>>>> when the
>>>> client
>>>>> accesses the checkpoint in these cases. The client must re-open
>>>>> the
>>>>> checkpoint.
>>>>>
>>>>> While in headless state, accessing checkpoint replicas does not
>>>>> work
>>>> if the
>>>>> node which hosts the active replica goes down. It will back
>>>>> working
>>>> when a
>>>>> SC available again.
>>>>>
>>>>> Solution:
>>>>> --------- The solution for this enhancement includes 2 parts:
>>>>>
>>>>> 1. To destroy un-recoverable checkpoint described above when
>>>>> both
>>>> SCs are
>>>>> down: When both SCs are down, the CPND deletes un-recoverable
>>>> checkpoint
>>>>> nodes and replicas on PLs. Then it requests CPA to destroy
>>>> corresponding
>>>>> checkpoint node by using new message CPA_EVT_ND2A_CKPT_DESTROY
>>>>>
>>>>> 2. To update CPD with checkpoint information When an active SC
>>>>> is up
>>>> after
>>>>> headless, CPND will update CPD with checkpoint information by
>>>>> using
>>>> new
>>>>> message CPD_EVT_ND2D_CKPT_INFO_UPDATE instead of using
>>>>> CPD_EVT_ND2D_CKPT_CREATE. This is because the CPND will create
>>>>> new
>>>> ckpt_id
>>>>> for the checkpoint which might be different with the current
>>>>> ckpt id
>>>> if the
>>>>> CPD_EVT_ND2D_CKPT_CREATE is used. The CPD collects checkpoint
>>>> information
>>>>> within 6s. During this updating time, following requests is
>>>>> rejected
>>>> with
>>>>> fault code SA_AIS_ERR_TRY_AGAIN:
>>>>> - CPD_EVT_ND2D_CKPT_CREATE
>>>>> - CPD_EVT_ND2D_CKPT_UNLINK
>>>>> - CPD_EVT_ND2D_ACTIVE_SET
>>>>> - CPD_EVT_ND2D_CKPT_RDSET
>>>>>
>>>>>
>>>>> Complete diffstat:
>>>>> ------------------
>>>>> osaf/libs/agents/saf/cpa/cpa_proc.c | 52
>>>> +++++++++++++++++++++++++++++++++++
>>>>> osaf/libs/common/cpsv/cpsv_edu.c | 43
>>>> +++++++++++++++++++++++++++++
>>>>> osaf/libs/common/cpsv/include/cpd_cb.h | 3 ++
>>>>> osaf/libs/common/cpsv/include/cpd_imm.h | 1 +
>>>>> osaf/libs/common/cpsv/include/cpd_proc.h | 7 ++++
>>>>> osaf/libs/common/cpsv/include/cpd_tmr.h | 3 +-
>>>>> osaf/libs/common/cpsv/include/cpnd_cb.h | 1 +
>>>>> osaf/libs/common/cpsv/include/cpnd_init.h | 2 +
>>>>> osaf/libs/common/cpsv/include/cpsv_evt.h | 20 +++++++++++++
>>>>> osaf/services/saf/cpsv/cpd/Makefile.am | 3 +-
>>>>> osaf/services/saf/cpsv/cpd/cpd_evt.c | 229
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++
>>>>> osaf/services/saf/cpsv/cpd/cpd_imm.c | 112
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>> osaf/services/saf/cpsv/cpd/cpd_init.c | 20 ++++++++++++-
>>>>> osaf/services/saf/cpsv/cpd/cpd_proc.c | 309
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> osaf/services/saf/cpsv/cpd/cpd_tmr.c | 7 ++++
>>>>> osaf/services/saf/cpsv/cpnd/cpnd_db.c | 16 ++++++++++
>>>>> osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 22 +++++++++++++++
>>>>> osaf/services/saf/cpsv/cpnd/cpnd_init.c | 23 ++++++++++++++-
>>>>> osaf/services/saf/cpsv/cpnd/cpnd_mds.c | 13 ++++++++
>>>>> osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 314
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>> 20 files changed, 1189 insertions(+), 11 deletions(-)
>>>>>
>>>>>
>>>>> Testing Commands:
>>>>> -----------------
>>>>> -
>>>>>
>>>>> Testing, Expected Results:
>>>>> --------------------------
>>>>> -
>>>>>
>>>>>
>>>>> Conditions of Submission:
>>>>> -------------------------
>>>>> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
>>>>>
>>>>>
>>>>> Arch Built Started Linux distro
>>>>> -------------------------------------------
>>>>> mips n n
>>>>> mips64 n n
>>>>> x86 n n
>>>>> x86_64 n n
>>>>> powerpc n n
>>>>> powerpc64 n n
>>>>>
>>>>>
>>>>> Reviewer Checklist:
>>>>> -------------------
>>>>> [Submitters: make sure that your review doesn't trigger any
>>>>> checkmarks!]
>>>>>
>>>>>
>>>>> Your checkin has not passed review because (see checked entries):
>>>>>
>>>>> ___ Your RR template is generally incomplete; it has too many blank
>>>> entries
>>>>> that need proper data filled in.
>>>>>
>>>>> ___ You have failed to nominate the proper persons for review and
>>>>> push.
>>>>>
>>>>> ___ Your patches do not have proper short+long header
>>>>>
>>>>> ___ You have grammar/spelling in your header that is unacceptable.
>>>>>
>>>>> ___ You have exceeded a sensible line length in your
>>>> headers/comments/text.
>>>>> ___ You have failed to put in a proper Trac Ticket # into your
>>>>> commits.
>>>>>
>>>>> ___ You have incorrectly put/left internal data in your comments/files
>>>>> (i.e. internal bug tracking tool IDs, product names etc)
>>>>>
>>>>> ___ You have not given any evidence of testing beyond basic build
>>>>> tests.
>>>>> Demonstrate some level of runtime or other sanity testing.
>>>>>
>>>>> ___ You have ^M present in some of your files. These have to be
>>>>> removed.
>>>>>
>>>>> ___ You have needlessly changed whitespace or added whitespace crimes
>>>>> like trailing spaces, or spaces before tabs.
>>>>>
>>>>> ___ You have mixed real technical changes with whitespace and other
>>>>> cosmetic code cleanup changes. These have to be separate
>>>>> commits.
>>>>>
>>>>> ___ You need to refactor your submission into logical chunks; there is
>>>>> too much content into a single commit.
>>>>>
>>>>> ___ You have extraneous garbage in your review (merge commits etc)
>>>>>
>>>>> ___ You have giant attachments which should never have been sent;
>>>>> Instead you should place your content in a public tree to be
>>>>> pulled.
>>>>>
>>>>> ___ You have too many commits attached to an e-mail; resend as
>>>>> threaded
>>>>> commits, or place in a public tree for a pull.
>>>>>
>>>>> ___ You have resent this content multiple times without a clear
>>>>> indication
>>>>> of what has changed between each re-send.
>>>>>
>>>>> ___ You have failed to adequately and individually address all of the
>>>>> comments and change requests that were proposed in the
>>>>> initial
>>>> review.
>>>>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email
>>>>> etc)
>>>>>
>>>>> ___ Your computer have a badly configured date and time; confusing the
>>>>> the threaded patch review.
>>>>>
>>>>> ___ Your changes affect IPC mechanism, and you don't present any
>>>>> results
>>>>> for in-service upgradability test.
>>>>>
>>>>> ___ Your changes affect user manual and documentation, your patch
>>>>> series
>>>>> do not contain the patch that updates the Doxygen manual.
>>>>>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel