Summary: amf: fix track callback when multiple CLM nodes leaves membership[#2372]. Review request for Trac Ticket(s): #2372 Peer Reviewer(s): AMF devs Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> Affected branch(es): ALL Development branch: <<IF ANY GIVE THE REPO URL>>
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- changeset 1ee79821742a117265da9a4d5ba60617ac86e2e4 Author: Praveen Malviya <praveen.malv...@oracle.com> Date: Mon, 27 Mar 2017 15:25:18 +0530 amf: fix track callback when multiple CLM nodes leaves membership[#2372]. In reported issue, two CLM nodes are locked simultaneously. For one of the nodes, CLM lock gets timed out and user gets REPAIR_PENDING as return code. The two payloads being locked hosts Amf_demo with 2N model. When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on, user clm locks PL-4 and AMF gets another track callback with rootcausetentity as PL-4. Callback contains information of PL-3 also as this node is still in pending change phase. AMFD starts terminating amf_demo on PL-4 but at the same time it incorreclty responds for PL-3 with invocationId of PL-4 callback. CLM assumes that for PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responds for PL-3 callback, node lock timer expires in CLMD and it sends complete callback to AMF and responds user with REPAIR_PENDING. AMF thinks this is the case of nodefailover and tries to failover PL-3. Patch fixes this problem in both AMFD and AMFND: -to act on CHANGE_START step only once for a node (amfd). -to act on COMPLETE step only when rootCauseEntity matches and if it is graceful removal of node(amfd). -to act only once in tracl callback for COMPLETE step(amfnd). Complete diffstat: ------------------ src/amf/amfd/clm.cc | 43 ++++++++++++++++++++++++++++++------------- src/amf/amfnd/clm.cc | 6 ++++-- 2 files changed, 34 insertions(+), 15 deletions(-) Testing Commands: ----------------- tested both the cases mentioned in the ticket. Testing, Expected Results: -------------------------- Both the cases passed. Conditions of Submission: ------------------------- Ack from any reviewer. Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel