Hi Nhat Pham,

On 2/15/2016 3:08 PM, Nhat Pham wrote:
> Hi Mahesh,
>
> It's good. Thank you. :)
>
> [AVM]  Up on rejoining of the SC`s The replica should be re-created regardless
> of another application opens it on PL4.
>                ( Note : this comment is based on your explanation have not yet
> reviewed/tested  ,
>                   currently i am struggling with  SC`s    not rejoining
> after headless state , i can provide you more on this once i  complte my
> review/testing)
>
> [Nhat] To make cloud resilience works, you need the patches from other
> services (log, amf, clm, ntf).
> @Minh: I heard that you created tar file which includes all patches. Could you
> please send it to Mahesh? Thanks

Able to resolve the issue ,  I was usung RPMS to test and the 
osafimmloadd/osafimmpbed are not yet part of payload RPMS,
( in case of headless, the immnd coordinator will be in payload so we 
need them on  payload as well ).
As work around we just copy osafimmloadd/osafimmpbed manually to payload 
, soon we will receive a patch
that separate  Imm tools in to a separate RPM and makes it part of 
Payload RPMS.

-AVM

>
> [AVM] I understand that , before I comment more on this   please allow me to
> understand
>               I am not still not very clear of the headless design in detail.
>               For example cluster membership of PL`s   during headless state ,
>                In the absence of  SC`s  (CLMD) dose the PLs is considered as
> cluster nodes or not (cluster membership) ?
>
> [Nhat] I don't know much about this.
> @ Anders: Could you please have comment about this? Thanks
>
> Best regards,
> Nhat Pham
>
> -----Original Message-----
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Monday, February 15, 2016 11:19 AM
> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net; 'Beatriz Brandao'
> <beatriz.bran...@ericsson.com>
> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support preserving and
> recovering checkpoint replicas during headless state V2 [#1621]
>
> Hi Nhat Pham,
>
> How is your holiday went
>
> Please find my comments below
>
> On 2/15/2016 8:43 AM, Nhat Pham wrote:
>> Hi Mahesh,
>>
>> For the comment 1, the patch will be updated accordingly.
> [AVM]  Please hold , I will provide more comments in this week , so we can
> have consolidated V3
>> For the comment 2, I think the CKPT service will not be backward
>> compatible if the scAbsenceAllowed is true.
>> The client can't create non-collocated checkpoint on SCs.
>>
>> Furthermore, this solution only protects the CKPT service from the
>> case "The non-collocated checkpoint  is created on a SC"
>> there are still the cases where the replicas are completely lost. Ex:
>>
>> - The non-collocated checkpoint created on a PL. The PL reboots. Both
>> replicas now locate on SCs. Then, headless state happens. All replicas are
>> lost.
>> - The non-collocated checkpoint has active replica locating on a PL
>> and this PL restarts during headless state
>> - The non-collocated checkpoint is created on PL3. This checkpoint is
>> also opened on PL4. Then SCs and PL3 reboot.
> [AVM]  Up on rejoining of the SC`s The replica should be re-created regardless
> of another application opens it on PL4.
>                ( Note : this comment is based on your explanation have not yet
> reviewed/tested  ,
>                   currently i am struggling with  SC`s    not rejoining
> after headless state , i can provide you more on this once i  complte my
> review/testing)
>> In this case, all replicas are lost and the client has to create it again.
>>
>> In case multiple nodes (which including SCs) reboot, losing replicas
>> is unpreventable. The patch is to recover the checkpoints in possible cases.
>> How do you think?
> [AVM] I understand that , before I comment more on this   please allow
> me to understand
>               I am not still not very clear of the headless design in detail.
>
>               For example cluster membership of PL`s   during headless
> state ,
>                In the absence of  SC`s  (CLMD) dose the PLs is considered as
> cluster nodes or not (cluster membership) ?
>
>                      - if not consider as  NON cluster nodes Checkpoint 
> Service
> API  should  leverage the SA Forum Cluster
>                        Membership Service  and API's can fail with
> SA_AIS_ERR_UNAVAILABLE
>
>                      - if considers as cluster nodes  we need to follow all 
> the
> defined rules which are defined in SAI-AIS-CKPT-B.02.02 specification
>
>               so give me some more time to review it completely , so that we
> can  have consolidated patch V3
>
> -AVM
>> Best regards,
>> Nhat Pham
>>
>> -----Original Message-----
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Friday, February 12, 2016 11:10 AM
>> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
>> Cc: opensaf-devel@lists.sourceforge.net; Beatriz Brandao
>> <beatriz.bran...@ericsson.com>
>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>> preserving and recovering checkpoint replicas during headless state V2
>> [#1621]
>>
>>
>> Comment 2 :
>>
>> After incorporating the comment one all the Limitations should be
>> prevented based on Hydra configuration is enabled in IMM status.
>>
>> Foe example :  if some application is trying to create
>>
>> non-collocated checkpoint active replica getting generated/locating on
>> SC then ,regardless of the heads (SC`s) status exist not exist should
>> return SA_AIS_ERR_NOT_SUPPORTED
>>
>> In other words, rather that allowing to created non-collocated
>> checkpoint when
>> heads(SC`s)  are exit , and non-collocated checkpoint getting
>> unrecoverable after heads(SC`s) rejoins.
>>
>> ======================================================================
>> =======================
>>>     Limitation: The CKPT service doesn't support recovering checkpoints in
>>>     following cases:
>>>     . The checkpoint which is unlinked before headless.
>>>     . The non-collocated checkpoint has active replica locating on SC.
>>>     . The non-collocated checkpoint has active replica locating on a PL
>>> and this PL
>>>     restarts during headless state. In this cases, the checkpoint replica is
>>>     destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned when the
>>> client
>>>     accesses the checkpoint in these cases. The client must re-open the
>>>     checkpoint.
>> ======================================================================
>> =======================
>>
>> -AVM
>>
>>
>> On 2/11/2016 12:52 PM, A V Mahesh wrote:
>>> Hi,
>>>
>>> I jut starred reviewing patch , I will be  giving comments as soon as
>>> I crossover any , to save some time.
>>>
>>> Comment 1 :
>>> This functionality should be under  checks if Hydra configuration is
>>> enabled in IMM attrName =
>>> const_cast<SaImmAttrNameT>("scAbsenceAllowed")
>>>
>>> Please see example how  LOG/AMF  services implemented it.
>>>
>>> -AVM
>>>
>>>
>>> On 1/29/2016 1:02 PM, Nhat Pham wrote:
>>>> Hi Mahesh,
>>>>
>>>> As described in the README, the CKPT service returns
>>>> SA_AIS_ERR_TRY_AGAIN fault code in this case.
>>>> I guess it's same for other services.
>>>>
>>>> @Anders: Could you please confirm this?
>>>>
>>>> Best regards,
>>>> Nhat Pham
>>>>
>>>> -----Original Message-----
>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>> Sent: Friday, January 29, 2016 2:11 PM
>>>> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>>> preserving and recovering checkpoint replicas during headless state
>>>> V2 [#1621]
>>>>
>>>> Hi,
>>>>
>>>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>>>       -  The behavior of application will be consistent with other
>>>>> saf services like imm/amf behavior  during headless state.
>>>>> [Nhat] I'm not clear what you mean about "consistent"?
>>>> In the obscene of  Director (SC's) , what is expected return values
>>>> of SAF API should ( all services ) ,
>>>>      which are not in aposition to  provide service at that moment.
>>>>
>>>> I think all services should return same  SAF ERRS., I thinks
>>>> currently we don't have  it , may be  Anders Widel  will help us.
>>>>
>>>> -AVM
>>>>
>>>>
>>>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>>> Hi Mahesh,
>>>>>
>>>>> Please see the attachment for the README. Let me know if there is
>>>>> any more information required.
>>>>>
>>>>> Regarding your comments:
>>>>>       -  during headless state  applications may behave like during
>>>>> CPND restart case [Nhat] Headless state and CPND restart are
>>>>> different events. Thus, the behavior is different.
>>>>> Headless state is a case where both SCs go down.
>>>>>
>>>>>       -  The behavior of application will be consistent with other
>>>>> saf services like imm/amf behavior  during headless state.
>>>>> [Nhat] I'm not clear what you mean about "consistent"?
>>>>>
>>>>> Best regards,
>>>>> Nhat Pham
>>>>>
>>>>> -----Original Message-----
>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>> Sent: Friday, January 29, 2016 11:12 AM
>>>>> To: Nhat Pham <nhat.p...@dektech.com.au>;
>>>>> anders.wid...@ericsson.com
>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>>>> preserving and recovering checkpoint replicas during headless state
>>>>> V2 [#1621]
>>>>>
>>>>> Hi Nhat Pham,
>>>>>
>>>>> I stared reviewing this patch , so can please provide  README file
>>>>> with scope and limitations , that will help to define
>>>>> testing/reviewing scope .
>>>>>
>>>>> Following are minimum things we can keep in mind while
>>>>> reviewing/accepting patch ,
>>>>>
>>>>> - Not effecting existing functionality
>>>>>       -  during headless state  applications may behave like during
>>>>> CPND restart case
>>>>>       -  The minimum functionally of application works
>>>>>       -  The behavior of application will be consistent with
>>>>>          other saf services like imm/amf behavior  during headless state.
>>>>>
>>>>> So please do provide any additional detailed in README if any of
>>>>> the above is deviated , that allow users to know about the
>>>>> limitations/deviation.
>>>>>
>>>>> -AVM
>>>>>
>>>>> On 1/4/2016 3:15 PM, Nhat Pham wrote:
>>>>>> Summary: cpsv: Support preserving and recovering checkpoint
>>>>>> replicas during headless state [#1621] Review request for Trac
>>>>>> Ticket(s):
>>>>>> #1621 Peer Reviewer(s): mahesh.va...@oracle.com;
>>>>>> anders.wid...@ericsson.com Pull request to:
>>>>>> mahesh.va...@oracle.com Affected branch(es): default Development
>>>>>> branch: default
>>>>>>
>>>>>> --------------------------------
>>>>>> Impacted area       Impact y/n
>>>>>> --------------------------------
>>>>>>       Docs                    n
>>>>>>       Build system            n
>>>>>>       RPM/packaging           n
>>>>>>       Configuration files     n
>>>>>>       Startup scripts         n
>>>>>>       SAF services            y
>>>>>>       OpenSAF services        n
>>>>>>       Core libraries          n
>>>>>>       Samples                 n
>>>>>>       Tests                   n
>>>>>>       Other                   n
>>>>>>
>>>>>>
>>>>>> Comments (indicate scope for each "y" above):
>>>>>> ---------------------------------------------
>>>>>>
>>>>>> changeset faec4a4445a4c23e8f630857b19aabb43b5af18d
>>>>>> Author:    Nhat Pham <nhat.p...@dektech.com.au>
>>>>>> Date:    Mon, 04 Jan 2016 16:34:33 +0700
>>>>>>
>>>>>>       cpsv: Support preserving and recovering checkpoint replicas
>>>>>> during headless state [#1621]
>>>>>>
>>>>>>       Background:
>>>>>>       ---------- This enhancement supports to preserve checkpoint
>>>>>> replicas
>>>>> in case
>>>>>>       both SCs down (headless state) and recover replicas in case
>>>>>> one of
>>>>> SCs up
>>>>>>       again. If both SCs goes down, checkpoint replicas on
>>>>>> surviving nodes
>>>>> still
>>>>>>       remain. When a SC is available again, surviving replicas are
>>>>> automatically
>>>>>>       registered to the SC checkpoint database. Content in
>>>>>> surviving
>>>>> replicas are
>>>>>>       intacted and synchronized to new replicas.
>>>>>>
>>>>>>       When no SC is available, client API calls changing checkpoint
>>>>> configuration
>>>>>>       which requires SC communication, are rejected. Client API
>>>>>> calls
>>>>> reading and
>>>>>>       writing existing checkpoint replicas still work.
>>>>>>
>>>>>>       Limitation: The CKPT service does not support recovering
>>>>>> checkpoints
>>>>> in
>>>>>>       following cases:
>>>>>>        - The checkpoint which is unlinked before headless.
>>>>>>        - The non-collocated checkpoint has active replica locating
>>>>>> on SC.
>>>>>>        - The non-collocated checkpoint has active replica locating
>>>>>> on a PL
>>>>> and this
>>>>>>       PL restarts during headless state. In this cases, the
>>>>>> checkpoint
>>>>> replica is
>>>>>>       destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned
>>>>>> when the
>>>>> client
>>>>>>       accesses the checkpoint in these cases. The client must
>>>>>> re-open the
>>>>>>       checkpoint.
>>>>>>
>>>>>>       While in headless state, accessing checkpoint replicas does
>>>>>> not work
>>>>> if the
>>>>>>       node which hosts the active replica goes down. It will back
>>>>>> working
>>>>> when a
>>>>>>       SC available again.
>>>>>>
>>>>>>       Solution:
>>>>>>       --------- The solution for this enhancement includes 2 parts:
>>>>>>
>>>>>>       1. To destroy un-recoverable checkpoint described above when
>>>>>> both
>>>>> SCs are
>>>>>>       down: When both SCs are down, the CPND deletes un-recoverable
>>>>> checkpoint
>>>>>>       nodes and replicas on PLs. Then it requests CPA to destroy
>>>>> corresponding
>>>>>>       checkpoint node by using new message
>>>>>> CPA_EVT_ND2A_CKPT_DESTROY
>>>>>>
>>>>>>       2. To update CPD with checkpoint information When an active
>>>>>> SC is up
>>>>> after
>>>>>>       headless, CPND will update CPD with checkpoint information by
>>>>>> using
>>>>> new
>>>>>>       message CPD_EVT_ND2D_CKPT_INFO_UPDATE instead of using
>>>>>>       CPD_EVT_ND2D_CKPT_CREATE. This is because the CPND will
>>>>>> create new
>>>>> ckpt_id
>>>>>>       for the checkpoint which might be different with the current
>>>>>> ckpt id
>>>>> if the
>>>>>>       CPD_EVT_ND2D_CKPT_CREATE is used. The CPD collects checkpoint
>>>>> information
>>>>>>       within 6s. During this updating time, following requests is
>>>>>> rejected
>>>>> with
>>>>>>       fault code SA_AIS_ERR_TRY_AGAIN:
>>>>>>       - CPD_EVT_ND2D_CKPT_CREATE
>>>>>>       - CPD_EVT_ND2D_CKPT_UNLINK
>>>>>>       - CPD_EVT_ND2D_ACTIVE_SET
>>>>>>       - CPD_EVT_ND2D_CKPT_RDSET
>>>>>>
>>>>>>
>>>>>> Complete diffstat:
>>>>>> ------------------
>>>>>>       osaf/libs/agents/saf/cpa/cpa_proc.c       |   52
>>>>> +++++++++++++++++++++++++++++++++++
>>>>>> osaf/libs/common/cpsv/cpsv_edu.c          |   43
>>>>> +++++++++++++++++++++++++++++
>>>>>> osaf/libs/common/cpsv/include/cpd_cb.h    |    3 ++
>>>>>>       osaf/libs/common/cpsv/include/cpd_imm.h   |    1 +
>>>>>>       osaf/libs/common/cpsv/include/cpd_proc.h  |    7 ++++
>>>>>>       osaf/libs/common/cpsv/include/cpd_tmr.h   |    3 +-
>>>>>>       osaf/libs/common/cpsv/include/cpnd_cb.h   |    1 +
>>>>>>       osaf/libs/common/cpsv/include/cpnd_init.h |    2 +
>>>>>>       osaf/libs/common/cpsv/include/cpsv_evt.h  |   20 +++++++++++++
>>>>>>       osaf/services/saf/cpsv/cpd/Makefile.am    |    3 +-
>>>>>>       osaf/services/saf/cpsv/cpd/cpd_evt.c      |  229
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> ++++
>>>>>> osaf/services/saf/cpsv/cpd/cpd_imm.c      |  112
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>> osaf/services/saf/cpsv/cpd/cpd_init.c     |   20 ++++++++++++-
>>>>>>       osaf/services/saf/cpsv/cpd/cpd_proc.c     |  309
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> osaf/services/saf/cpsv/cpd/cpd_tmr.c      |    7 ++++
>>>>>>       osaf/services/saf/cpsv/cpnd/cpnd_db.c     |   16 ++++++++++
>>>>>>       osaf/services/saf/cpsv/cpnd/cpnd_evt.c    |   22 +++++++++++++++
>>>>>>       osaf/services/saf/cpsv/cpnd/cpnd_init.c   |   23 ++++++++++++++-
>>>>>>       osaf/services/saf/cpsv/cpnd/cpnd_mds.c    |   13 ++++++++
>>>>>>       osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  314
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>>       20 files changed, 1189 insertions(+), 11 deletions(-)
>>>>>>
>>>>>>
>>>>>> Testing Commands:
>>>>>> -----------------
>>>>>> -
>>>>>>
>>>>>> Testing, Expected Results:
>>>>>> --------------------------
>>>>>> -
>>>>>>
>>>>>>
>>>>>> Conditions of Submission:
>>>>>> -------------------------
>>>>>>       <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
>>>>>>
>>>>>>
>>>>>> Arch      Built     Started    Linux distro
>>>>>> -------------------------------------------
>>>>>> mips        n          n
>>>>>> mips64      n          n
>>>>>> x86         n          n
>>>>>> x86_64      n          n
>>>>>> powerpc     n          n
>>>>>> powerpc64   n          n
>>>>>>
>>>>>>
>>>>>> Reviewer Checklist:
>>>>>> -------------------
>>>>>> [Submitters: make sure that your review doesn't trigger any
>>>>>> checkmarks!]
>>>>>>
>>>>>>
>>>>>> Your checkin has not passed review because (see checked entries):
>>>>>>
>>>>>> ___ Your RR template is generally incomplete; it has too many
>>>>>> blank
>>>>> entries
>>>>>>          that need proper data filled in.
>>>>>>
>>>>>> ___ You have failed to nominate the proper persons for review and
>>>>>> push.
>>>>>>
>>>>>> ___ Your patches do not have proper short+long header
>>>>>>
>>>>>> ___ You have grammar/spelling in your header that is unacceptable.
>>>>>>
>>>>>> ___ You have exceeded a sensible line length in your
>>>>> headers/comments/text.
>>>>>> ___ You have failed to put in a proper Trac Ticket # into your
>>>>>> commits.
>>>>>>
>>>>>> ___ You have incorrectly put/left internal data in your comments/files
>>>>>>          (i.e. internal bug tracking tool IDs, product names etc)
>>>>>>
>>>>>> ___ You have not given any evidence of testing beyond basic build
>>>>>> tests.
>>>>>>          Demonstrate some level of runtime or other sanity testing.
>>>>>>
>>>>>> ___ You have ^M present in some of your files. These have to be
>>>>>> removed.
>>>>>>
>>>>>> ___ You have needlessly changed whitespace or added whitespace crimes
>>>>>>          like trailing spaces, or spaces before tabs.
>>>>>>
>>>>>> ___ You have mixed real technical changes with whitespace and other
>>>>>>          cosmetic code cleanup changes. These have to be separate
>>>>>> commits.
>>>>>>
>>>>>> ___ You need to refactor your submission into logical chunks; there is
>>>>>>          too much content into a single commit.
>>>>>>
>>>>>> ___ You have extraneous garbage in your review (merge commits etc)
>>>>>>
>>>>>> ___ You have giant attachments which should never have been sent;
>>>>>>          Instead you should place your content in a public tree to
>>>>>> be pulled.
>>>>>>
>>>>>> ___ You have too many commits attached to an e-mail; resend as
>>>>>> threaded
>>>>>>          commits, or place in a public tree for a pull.
>>>>>>
>>>>>> ___ You have resent this content multiple times without a clear
>>>>>> indication
>>>>>>          of what has changed between each re-send.
>>>>>>
>>>>>> ___ You have failed to adequately and individually address all of the
>>>>>>          comments and change requests that were proposed in the
>>>>>> initial
>>>>> review.
>>>>>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email
>>>>>> etc)
>>>>>>
>>>>>> ___ Your computer have a badly configured date and time; confusing the
>>>>>>          the threaded patch review.
>>>>>>
>>>>>> ___ Your changes affect IPC mechanism, and you don't present any
>>>>>> results
>>>>>>          for in-service upgradability test.
>>>>>>
>>>>>> ___ Your changes affect user manual and documentation, your patch
>>>>>> series
>>>>>>          do not contain the patch that updates the Doxygen manual.
>>>>>>
>


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to