Summary: split-brain: select active SC from largest network partition V3 
[#2795] 
Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            n
 OpenSAF services        y 
 Core libraries          y 
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology events to be 
processed first
osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover



revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:
------------
 src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
------------------
 src/amf/amfd/role.cc                     |   2 +-
 src/fm/fmd/fm_cb.h                       |   2 +-
 src/fm/fmd/fm_main.cc                    |  26 +-
 src/fm/fmd/fm_mds.cc                     |   2 +
 src/fm/fmd/fm_rda.cc                     |  27 +-
 src/osaf/consensus/consensus.cc          | 435 ++++++++++++++++++++++++++-----
 src/osaf/consensus/consensus.h           |  55 +++-
 src/osaf/consensus/key_value.cc          | 105 +++++---
 src/osaf/consensus/key_value.h           |  19 +-
 src/osaf/consensus/plugins/etcd.plugin   |  86 +++++-
 src/osaf/consensus/plugins/etcd3.plugin  | 366 ++++++++++++++++++++++++++
 src/osaf/consensus/plugins/sample.plugin |  67 ++++-
 src/rde/rded/rde_cb.h                    |  12 +-
 src/rde/rded/rde_main.cc                 |  75 ++++--
 src/rde/rded/rde_mds.cc                  |  39 ++-
 src/rde/rded/rde_rda.cc                  |   2 +-
 src/rde/rded/role.cc                     |  46 +++-
 src/rde/rded/role.h                      |   2 +-
 18 files changed, 1180 insertions(+), 188 deletions(-)


Testing Commands:
-----------------
1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system)
3) Isolate active cluster

Testing, Expected Results:
--------------------------
1) No error
2) Standby will fail to be promoted as active as the takeover request is 
rejected
3) Standby will be promoted

Conditions of Submission:
-------------------------
Ack from any reviewer


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      y          y 
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to