Summary: osaf: fix issue with unlock if etcdctl times out [#2848]
Review request for Ticket(s): 2848
Peer Reviewer(s): Anders, Hans, Ravi 
Pull request to:
Affected branch(es): develop
Development branch: ticket-2848
Base revision: 93e2808fb0bd3143a77e31dd2f0115a6596479ed
Personal repository: git://git.code.sf.net/u/userid-2226215/review

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            n 
 OpenSAF services        y 
 Core libraries          y 
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

revision 0855c361a9c736bfdac4dfd7f5c834a338963a3b
Author: Gary Lee <gary....@dektech.com.au>
Date:   Thu, 10 May 2018 18:48:53 +1000

osaf: fix issue with unlock if etcdctl times out [#2848]

In etcd3.plugin, if the unlock transaction times out
for some reason, unlock() would return 1, indicating it is owned by
someone else when it isn't because lock_owner was not
being called correctly.

Also PromoteThisNode() would indicate a successful lock
attempt when it failed, because rc was overwritten.



Complete diffstat:
------------------
 src/osaf/consensus/consensus.cc         |  8 ++++----
 src/osaf/consensus/plugins/etcd3.plugin | 23 +++++++++++++++--------
 2 files changed, 19 insertions(+), 12 deletions(-)


Testing Commands:
-----------------
0) Ensure split-brain prevention is enabled, remote fencing disabled. Use 
etcd3.plugin.
1) Ensure SC-1 is active
2) Simulate split-brain by firewalling SC-1 and PL-3 from the
   rest of the cluster. ie. [SC-1, PL-3] vs [SC-2, PL-4, PL-5]
3) SC-2 will issue a takeover request and it should be accepted
4) When SC-2 tries to demote SC-1, unlock should succeed

Previously you may see something like (it's hard to reproduce):

2018-05-07 02:59:55.140 SC-2 osaffmd[284]: NO Controller Failover: Setting role 
to ACTIVE
2018-05-07 02:59:59.335 SC-2 osaffmd[284]: NO Locked failed: SC-1
2018-05-07 02:59:59.371 SC-2 osaffmd[284]: NO Current active controller is SC-1
2018-05-07 03:00:04.133 SC-2 osaffmd[284]: NO Takeover request accepted
2018-05-07 03:00:11.172 SC-2 osaffmd[284]: NO Unlock failed: #012lock_owner
2018-05-07 03:00:11.172 SC-2 osaffmd[284]: ER Lock is owned by another node
2018-05-07 03:00:11.172 SC-2 osaffmd[284]: ER Unlock failed (7)
2018-05-07 03:00:11.173 SC-2 osaffmd[284]: WA Unlock failed (7)
2018-05-07 03:00:11.386 SC-2 osaffmd[284]: NO Locked failed: SC-1
2018-05-07 03:00:11.420 SC-2 osaffmd[284]: NO Active controller set to SC-2


Testing, Expected Results:
--------------------------
See above

Conditions of Submission:
-------------------------
Ack from any reviewer or in one week

Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      y          y 
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to