Hi Minh, I have started reviewing it.
Thanks, Praveen On 08-Sep-16 11:45 AM, minh chau wrote: > Hi, > > Please help to review this ticket. > > Thanks, > Minh > > On 05/09/16 11:43, Minh Hon Chau wrote: >> Summary: AMF: Fix SG unstable from admin continuation of nodegroup >> after headless [#1987] >> Review request for Trac Ticket(s): 1987 >> Peer Reviewer(s): AMF devs >> Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> >> Affected branch(es): 5.1, default >> Development branch: default >> >> -------------------------------- >> Impacted area Impact y/n >> -------------------------------- >> Docs n >> Build system n >> RPM/packaging n >> Configuration files n >> Startup scripts n >> SAF services y >> OpenSAF services n >> Core libraries n >> Samples n >> Tests n >> Other n >> >> >> Comments (indicate scope for each "y" above): >> --------------------------------------------- >> <<EXPLAIN/COMMENT THE PATCH SERIES HERE>> >> >> changeset 3a6bc0645a7d6ce8ce57f51271e0c129f6ecf0f4 >> Author: minh-chau <[email protected]> >> Date: Mon, 05 Sep 2016 09:19:14 +1000 >> >> AMF: Fix SG unstable from admin continuation of nodegroup after >> headless >> [#1987] >> >> The SG becomes unstable because some variables used in nodegroup >> operation >> are not restored after headless if this admin operation on >> nodegroup was >> interrupted just before cluster goes into headless stage. >> >> In order to restore nodegroup operation, AMF needs to know exactly >> whether >> nodegroup operation was running during headless up on @susi >> assignment. If >> susi is in QUIESCED, QUIESCING or being removed while its related >> entities >> su, si, sg are not in LOCKED and SHUTTING_DOWN, that means either >> node or >> nodegroup MUST be in LOCKED or SHUTTING DOWN. In case of >> SHUTTING_DOWN >> saAmfNGAdminState, that's enough to know a nodegroup operation was >> running. >> However, if saAmfNGAdminState is in LOCKED, this case is an >> ambiguity of >> locking a node. The reason of differentiation of locking a node or >> node >> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and >> AVD_SG_FSM_SU_OPER >> for node group operation while AVD_SG_FSM_SU_OPER is only used for >> node >> operation. When 2N SG uses AVD_SG_FSM_SG_ADMIN for nodegroup, the >> saAmfSGAdminState is borrowed (but not updated to IMM) to run the >> admin >> operation sequence. Therefore, after headless if >> AVD_SG_FSM_SG_ADMIN was >> being used for nodegroup then saAmfSGAdminState also needs to be set. >> >> Because SG FSM state is used to restore nodegroup during restoring >> susi >> assignment, the osaAmfSGFsmState (RTA) needs to be read earlier >> than reading >> susi assignment. This needs active AMFD become implementer earlier >> than >> reading sg object. There was a known ticket reported in 1720, if >> only make >> active AMFD as early implementer than it will cause the standby >> AMFD missing >> ccb apply callback.This patch also needs to set both active and >> standby AMFD >> become implementer and applier earlier so that AMFD can read >> osaAmfSGFsmState and do not cause regression of 1720. >> >> >> Complete diffstat: >> ------------------ >> osaf/services/saf/amf/amfd/include/node.h | 3 + >> osaf/services/saf/amf/amfd/include/sg.h | 1 + >> osaf/services/saf/amf/amfd/nodegroup.cc | 83 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> osaf/services/saf/amf/amfd/role.cc | 20 ++++++------ >> osaf/services/saf/amf/amfd/sg.cc | 15 +++++---- >> osaf/services/saf/amf/amfd/sgproc.cc | 2 +- >> osaf/services/saf/amf/amfd/siass.cc | 4 +- >> 7 files changed, 109 insertions(+), 19 deletions(-) >> >> >> Testing Commands: >> ----------------- >> Execute test cases in ticket >> >> >> Testing, Expected Results: >> -------------------------- >> Tests pass >> >> >> Conditions of Submission: >> ------------------------- >> ack from reviewers >> >> >> Arch Built Started Linux distro >> ------------------------------------------- >> mips n n >> mips64 n n >> x86 n n >> x86_64 y y >> powerpc n n >> powerpc64 n n >> >> >> Reviewer Checklist: >> ------------------- >> [Submitters: make sure that your review doesn't trigger any checkmarks!] >> >> >> Your checkin has not passed review because (see checked entries): >> >> ___ Your RR template is generally incomplete; it has too many blank >> entries >> that need proper data filled in. >> >> ___ You have failed to nominate the proper persons for review and push. >> >> ___ Your patches do not have proper short+long header >> >> ___ You have grammar/spelling in your header that is unacceptable. >> >> ___ You have exceeded a sensible line length in your >> headers/comments/text. >> >> ___ You have failed to put in a proper Trac Ticket # into your commits. >> >> ___ You have incorrectly put/left internal data in your comments/files >> (i.e. internal bug tracking tool IDs, product names etc) >> >> ___ You have not given any evidence of testing beyond basic build tests. >> Demonstrate some level of runtime or other sanity testing. >> >> ___ You have ^M present in some of your files. These have to be removed. >> >> ___ You have needlessly changed whitespace or added whitespace crimes >> like trailing spaces, or spaces before tabs. >> >> ___ You have mixed real technical changes with whitespace and other >> cosmetic code cleanup changes. These have to be separate commits. >> >> ___ You need to refactor your submission into logical chunks; there is >> too much content into a single commit. >> >> ___ You have extraneous garbage in your review (merge commits etc) >> >> ___ You have giant attachments which should never have been sent; >> Instead you should place your content in a public tree to be pulled. >> >> ___ You have too many commits attached to an e-mail; resend as threaded >> commits, or place in a public tree for a pull. >> >> ___ You have resent this content multiple times without a clear >> indication >> of what has changed between each re-send. >> >> ___ You have failed to adequately and individually address all of the >> comments and change requests that were proposed in the initial >> review. >> >> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) >> >> ___ Your computer have a badly configured date and time; confusing the >> the threaded patch review. >> >> ___ Your changes affect IPC mechanism, and you don't present any results >> for in-service upgradability test. >> >> ___ Your changes affect user manual and documentation, your patch series >> do not contain the patch that updates the Doxygen manual. >> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
