Summary: osaf: fix issue with unlock if etcdctl times out [#2848] Review request for Ticket(s): 2848 Peer Reviewer(s): Anders, Hans, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2848 Base revision: 93e2808fb0bd3143a77e31dd2f0115a6596479ed Personal repository: git://git.code.sf.net/u/userid-2226215/review
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- revision 0855c361a9c736bfdac4dfd7f5c834a338963a3b Author: Gary Lee <gary....@dektech.com.au> Date: Thu, 10 May 2018 18:48:53 +1000 osaf: fix issue with unlock if etcdctl times out [#2848] In etcd3.plugin, if the unlock transaction times out for some reason, unlock() would return 1, indicating it is owned by someone else when it isn't because lock_owner was not being called correctly. Also PromoteThisNode() would indicate a successful lock attempt when it failed, because rc was overwritten. Complete diffstat: ------------------ src/osaf/consensus/consensus.cc | 8 ++++---- src/osaf/consensus/plugins/etcd3.plugin | 23 +++++++++++++++-------- 2 files changed, 19 insertions(+), 12 deletions(-) Testing Commands: ----------------- 0) Ensure split-brain prevention is enabled, remote fencing disabled. Use etcd3.plugin. 1) Ensure SC-1 is active 2) Simulate split-brain by firewalling SC-1 and PL-3 from the rest of the cluster. ie. [SC-1, PL-3] vs [SC-2, PL-4, PL-5] 3) SC-2 will issue a takeover request and it should be accepted 4) When SC-2 tries to demote SC-1, unlock should succeed Previously you may see something like (it's hard to reproduce): 2018-05-07 02:59:55.140 SC-2 osaffmd[284]: NO Controller Failover: Setting role to ACTIVE 2018-05-07 02:59:59.335 SC-2 osaffmd[284]: NO Locked failed: SC-1 2018-05-07 02:59:59.371 SC-2 osaffmd[284]: NO Current active controller is SC-1 2018-05-07 03:00:04.133 SC-2 osaffmd[284]: NO Takeover request accepted 2018-05-07 03:00:11.172 SC-2 osaffmd[284]: NO Unlock failed: #012lock_owner 2018-05-07 03:00:11.172 SC-2 osaffmd[284]: ER Lock is owned by another node 2018-05-07 03:00:11.172 SC-2 osaffmd[284]: ER Unlock failed (7) 2018-05-07 03:00:11.173 SC-2 osaffmd[284]: WA Unlock failed (7) 2018-05-07 03:00:11.386 SC-2 osaffmd[284]: NO Locked failed: SC-1 2018-05-07 03:00:11.420 SC-2 osaffmd[284]: NO Active controller set to SC-2 Testing, Expected Results: -------------------------- See above Conditions of Submission: ------------------------- Ack from any reviewer or in one week Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel