Hi Nhat Pham, I am working on `MDS:TIPC include node name as a part of callback_info events [#1522]` i will start as soon as this is pushed .
-AVM On 3/3/2016 3:41 PM, Nhat Pham wrote: > Hi Mahesh, > > Have you reviewed the patch? > > Best regards, > Nhat Pham > > -----Original Message----- > From: A V Mahesh [mailto:[email protected]] > Sent: Monday, February 29, 2016 1:15 PM > To: Nhat Pham <[email protected]>; [email protected] > Cc: [email protected] > Subject: Re: [devel] [PATCH 0 of 1] Review Request for cpsv: Support > preserving and recovering checkpoint replicas during headless state V3 [#1621] > > Hi Nhat Pham, > > I will review V3 patch and do the final functional testing and get back to > you soon. > ( I may take some time , I also need to work on my published MDS > enhancements ) > > -AVM > > > On 2/29/2016 9:39 AM, Nhat Pham wrote: >> Hi, >> >> Following is the summary of updating in V3: >> >> Comment 1: This functionality should be under checks if Hydra >> configuration is enabled in IMM attrName = >> const_cast<SaImmAttrNameT>("scAbsenceAllowed"). >> >> Status: Included in V3 >> >> Comment 2: To keep the scope of CPSV service as non-collocated >> checkpoint creation NOT_SUPPORTED , if cluster is running with >> IMMSV_SC_ABSENCE_ALLOWED ( headless state configuration enabled at the >> time of cluster startup currently it is not configurable , so there no >> chance of run-time configuration change ). >> >> Status: No change in code. The CPSV still keep supporting >> non-collocated checkpoint even if IMMSV_SC_ABSENCE_ALLOWED is enable. >> >> Comment 3: This is about case where checkpoint node director (cpnd) >> crashes during headless state. In this case the cpnd can't finish >> starting because it can't initialize CLM service. >> Then after time out, the AMF triggers a restart again. Finally, the >> node is rebooted. >> It is expected that this problem should not lead to a node reboot. >> >> Status: Included in V3. CPND reinitializes CLM service if the fault >> TRY_AGAIN is returned. >> >> Comment 4: The Suggestion was to re-create the checkpoint without any >> sections in case the all replicas is lost. If the sections were >> re-created, the application wouldn't know that data has been lost. I >> think the BAD_HANDLE approach is okay since we have used it in other >> services, but I see it as kind of a hack solution that is not really in line >> with the specs. >> The specs never intended BAD_HANDLE to be something that can happen >> spontaneously on a previously valid handle, lest you are suffering >> from memory corruption. In the future we could consider the >> feasibility of avoiding spontaneous BAD_HANDLE where possible, and in >> CKPT I think it may be possible by re-creating the checkpoints. >> >> Status: NOT included in V3. >> This change is quite much and requires a detailed design in different >> scenarios. I would suggest to create an enhancement ticket for this. >> How would you think? >> >> Best regards, >> Nhat Pham >> >> -----Original Message----- >> From: Nhat Pham [mailto:[email protected]] >> Sent: Monday, February 29, 2016 11:06 AM >> To: [email protected]; [email protected] >> Cc: [email protected] >> Subject: [devel] [PATCH 0 of 1] Review Request for cpsv: Support >> preserving and recovering checkpoint replicas during headless state V3 >> [#1621] >> >> Summary: cpsv: Support preserving and recovering checkpoint replicas >> during headless state V3 [#1621] Review request for Trac Ticket(s): >> 1621 Peer >> Reviewer(s): [email protected]; [email protected] Pull >> request to: [email protected] Affected branch(es): default >> Development >> branch: default >> >> -------------------------------- >> Impacted area Impact y/n >> -------------------------------- >> Docs n >> Build system n >> RPM/packaging n >> Configuration files n >> Startup scripts n >> SAF services y >> OpenSAF services n >> Core libraries n >> Samples n >> Tests n >> Other n >> >> >> Comments (indicate scope for each "y" above): >> --------------------------------------------- >> >> changeset 8559fe4cea27efc8234f7cf779f3c7413efcd40f >> Author: Nhat Pham <[email protected]> >> Date: Mon, 29 Feb 2016 11:02:15 +0700 >> >> cpsv: Support preserving and recovering checkpoint replicas during >> headless state V3 [#1621] >> >> Background: >> ---------- >> This enhancement supports to preserve checkpoint replicas in case >> both SCs down (headless state) and recover replicas in case one of >> SCs up >> again. If both SCs goes down, checkpoint replicas on surviving nodes >> still >> remain. When a SC is available again, surviving replicas are >> automatically >> registered to the SC checkpoint database. Content in surviving >> replicas are >> intacted and synchronized to new replicas. >> >> When no SC is available, client API calls changing checkpoint >> configuration >> which requires SC communication, are rejected. Client API calls >> reading and >> writing existing checkpoint replicas still work. >> >> Limitation: The CKPT service does not support recovering checkpoints >> in >> following cases: >> - The checkpoint which is unlinked before headless. >> - The non-collocated checkpoint has active replica locating on SC. >> - The non-collocated checkpoint has active replica locating on a PL >> and this >> PL restarts during headless state. In this cases, the checkpoint >> replica is >> destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned when the >> client >> accesses the checkpoint in these cases. The client must re-open the >> checkpoint. >> >> While in headless state, accessing checkpoint replicas does not work >> if the >> node which hosts the active replica goes down. It will back working >> when a >> SC available again. >> >> Solution: >> --------- >> The solution for this enhancement includes 2 parts: >> >> 1. To destroy un-recoverable checkpoint described above when both SCs >> are >> down: When both SCs are down, the CPND deletes un-recoverable >> checkpoint >> nodes and replicas on PLs. Then it requests CPA to destroy >> corresponding >> checkpoint node by using new message CPA_EVT_ND2A_CKPT_DESTROY >> >> 2. To update CPD with checkpoint information When an active SC is up >> after >> headless, CPND will update CPD with checkpoint information by using >> new >> message CPD_EVT_ND2D_CKPT_INFO_UPDATE instead of using >> CPD_EVT_ND2D_CKPT_CREATE. This is because the CPND will create new >> ckpt_id >> for the checkpoint which might be different with the current ckpt id >> if the >> CPD_EVT_ND2D_CKPT_CREATE is used. The CPD collects checkpoint >> information >> within 6s. During this updating time, following requests is rejected >> with >> fault code SA_AIS_ERR_TRY_AGAIN: >> - CPD_EVT_ND2D_CKPT_CREATE >> - CPD_EVT_ND2D_CKPT_UNLINK >> - CPD_EVT_ND2D_ACTIVE_SET >> - CPD_EVT_ND2D_CKPT_RDSET >> >> >> Complete diffstat: >> ------------------ >> osaf/libs/agents/saf/cpa/cpa_proc.c | 52 >> ++++++++++++++++++++++++++ >> osaf/libs/common/cpsv/cpsv_edu.c | 43 +++++++++++++++++++++ >> osaf/libs/common/cpsv/include/cpd_cb.h | 4 ++ >> osaf/libs/common/cpsv/include/cpd_imm.h | 2 + >> osaf/libs/common/cpsv/include/cpd_proc.h | 7 +++ >> osaf/libs/common/cpsv/include/cpd_tmr.h | 3 +- >> osaf/libs/common/cpsv/include/cpnd_cb.h | 3 + >> osaf/libs/common/cpsv/include/cpnd_init.h | 3 + >> osaf/libs/common/cpsv/include/cpsv_evt.h | 20 ++++++++++ >> osaf/services/saf/cpsv/cpd/Makefile.am | 3 +- >> osaf/services/saf/cpsv/cpd/cpd_evt.c | 229 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++++++++++++++++++++++++++++++++++++++ >> osaf/services/saf/cpsv/cpd/cpd_imm.c | 202 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> +++++++++++++++++++++++++ >> osaf/services/saf/cpsv/cpd/cpd_init.c | 26 ++++++++++++- >> osaf/services/saf/cpsv/cpd/cpd_proc.c | 309 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++ >> osaf/services/saf/cpsv/cpd/cpd_tmr.c | 7 +++ >> osaf/services/saf/cpsv/cpnd/Makefile.am | 6 ++- >> osaf/services/saf/cpsv/cpnd/cpnd_db.c | 16 ++++++++ >> osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 24 ++++++++++++ >> osaf/services/saf/cpsv/cpnd/cpnd_init.c | 34 ++++++++++++++++- >> osaf/services/saf/cpsv/cpnd/cpnd_mds.c | 13 ++++++ >> osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 429 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- >> 21 files changed, 1423 insertions(+), 12 deletions(-) >> >> >> Testing Commands: >> ----------------- >> - >> >> Testing, Expected Results: >> -------------------------- >> - >> >> >> Conditions of Submission: >> ------------------------- >> - >> >> >> Arch Built Started Linux distro >> ------------------------------------------- >> mips n n >> mips64 n n >> x86 n n >> x86_64 y y >> powerpc n n >> powerpc64 n n >> >> >> Reviewer Checklist: >> ------------------- >> [Submitters: make sure that your review doesn't trigger any >> checkmarks!] >> >> >> Your checkin has not passed review because (see checked entries): >> >> ___ Your RR template is generally incomplete; it has too many blank entries >> that need proper data filled in. >> >> ___ You have failed to nominate the proper persons for review and push. >> >> ___ Your patches do not have proper short+long header >> >> ___ You have grammar/spelling in your header that is unacceptable. >> >> ___ You have exceeded a sensible line length in your headers/comments/text. >> >> ___ You have failed to put in a proper Trac Ticket # into your commits. >> >> ___ You have incorrectly put/left internal data in your comments/files >> (i.e. internal bug tracking tool IDs, product names etc) >> >> ___ You have not given any evidence of testing beyond basic build tests. >> Demonstrate some level of runtime or other sanity testing. >> >> ___ You have ^M present in some of your files. These have to be removed. >> >> ___ You have needlessly changed whitespace or added whitespace crimes >> like trailing spaces, or spaces before tabs. >> >> ___ You have mixed real technical changes with whitespace and other >> cosmetic code cleanup changes. These have to be separate commits. >> >> ___ You need to refactor your submission into logical chunks; there is >> too much content into a single commit. >> >> ___ You have extraneous garbage in your review (merge commits etc) >> >> ___ You have giant attachments which should never have been sent; >> Instead you should place your content in a public tree to be pulled. >> >> ___ You have too many commits attached to an e-mail; resend as threaded >> commits, or place in a public tree for a pull. >> >> ___ You have resent this content multiple times without a clear indication >> of what has changed between each re-send. >> >> ___ You have failed to adequately and individually address all of the >> comments and change requests that were proposed in the initial review. >> >> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) >> >> ___ Your computer have a badly configured date and time; confusing the >> the threaded patch review. >> >> ___ Your changes affect IPC mechanism, and you don't present any results >> for in-service upgradability test. >> >> ___ Your changes affect user manual and documentation, your patch series >> do not contain the patch that updates the Doxygen manual. >> >> >> ---------------------------------------------------------------------- >> ------ >> -- >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >> _______________________________________________ >> Opensaf-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >> >> > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
