Summary: IMM: 2PBE
Review request for Trac Ticket(s): #21
Peer Reviewer(s): Neel
Pull request to: 
Affected branch(es): default(4.4)
Development branch:

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            y
 OpenSAF services        n
 Core libraries          n
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

changeset 61de45652342df8ea99de81aada2758a120a1613
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Thu, 10 Oct 2013 21:27:32 +0200

        IMM: 2PBE patch-1 (loading) [#21]

        This patch contains the 2PBE loading mechanism, needed to support 2PBE. 
The
        IMMD's will detect 2PBE loading by the IMMSV_2PBE_PEER_SC_MAX_WAIT
        environment variable being set in the immd.conf file. The active IMMD 
will
        order each SC IMMND to execute a "preload" probing the SC local 
filesystem
        for the file state that would be loaded to the cluster if that IMMND was
        chosen as coord. The IMMND sends these stats to the active IMMD.

        The IMMD will wait for the IMMNDs at *both* SCs to complete this task 
and
        then determine which SC has the apparently latest file state. The IMMND 
at
        that SC will then be chosen as IMMND coord. Actual loading then 
proceeds in
        the same way as for regular 1PBE. The IMMSV_2PBE_PEER_SC_MAX_WAIT is by
        default 30 seconds. This value should be high enough to make it very
        unlikely that the active IMMD is forced choose loader when only a 
single SC
        IMMND has joined. If that happens, then the risk is that the cluster 
restart
        will be done *not* using the latest persistent imm state, effectively
        rewinding the imm state. (Note the same thing willl happen with regular 
1PBE
        based on a shared filesystem (DRBD) if the one SC fails to come up in 
time
        to join the DRBD sync protocol. The corresponding DRBD timeout is on the
        order of 20 seconds.

        When loading has completed, additional 2PBE functiolaity will start two
        PBEs, one at each SC. That functionality is delivered in subsequent 
patches.

changeset 8e597befaf60b26aa840a7c7bad1b4b2d220f752
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Fri, 11 Oct 2013 01:17:44 +0200

        IMM: 2PBE patch-2 (dumping) [#21]

         This patch contains the 2PBE dumping mechanism, needed to support 
2PBE. A
        PBE process is started by the IMMND at each SC, not just the IMMND 
coord.

        The PBE colocated with the IMMND coord, called the primary PBE, is 
still the
        coordinator for transaction commits (for CCBs and PRT operations and 
class-
        create/deletes). The primary PBE (sometimes called the A-side PBE) works
        very much in the same way as the regular single PBE does in 1-PBE. The 
PBE
        colocated with the SC resident but non-coord IMMND is called the slave 
PBE
        (sometimes called the B-side PBE).

        With 2PBE, *both* PBEs must be available for the imm to be persistent-
        writable. If one or boths PBEs are unavailable (or unresponsive) then
        persistent writes (ccbs and PRT operations) will fail.

        In 2PBE, a restarted PBE will more often need to regenerate the sqlite 
file.
        On the other hand, regeneration of the sqlite file should be faster in 
2PBE
        than in regular 1PBE because the file is typically placed on a local 
file
        system.

        A subsequent patch will provide a mechhanism for allowing 1-safe-2PBE. 
This
        will allow the imm to open up for persistent writes when only one of 
the two
        PBEs are available. THis will only be allowed when and during the 
absence of
        an SC. As soon as the other SC rejoins the IMM has to re enter the non-
        persistent-writable state.

changeset 9efd67f95ad153edf224af41e999352853ae1a3f
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Fri, 11 Oct 2013 02:08:51 +0200

        IMM: 2PBE patch-3 (1safe2pbe) [#21]

        This patch contains the 2PBE 1safe2Pbe mechanism. This mechanism allows 
an
        OpenSAF cluster to open up for persistent writes using only one of the 
two
        PBEs - temporarily.

        This is only intended to be used as an emergency action when one SC is 
long
        term unavailable, e.g. hardware problems. As soon as the other SC 
returns,
        the IMM has to re-enter normal 2-safe 2PBE and reject persistent writes
        until the slave PBE has synced (regenerated its sqlite file) and 
rejoined
        the cluster.

        The 1safe2PBE state is entered by the administrative opeation:

         immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:8 \
        opensafImm=opensafImm,safApp=safImmService

        It is exited either automatically by a rejoined SC or by an explicit
        administrative opertion:

         immadm -o 2 -p opensafImmNostdFlags:SA_UINT32_T:8 \
        opensafImm=opensafImm,safApp=safImmService


Complete diffstat:
------------------
 osaf/libs/agents/saf/imma/imma_db.c              |     4 +-
 osaf/libs/agents/saf/imma/imma_oi_api.c          |     4 +-
 osaf/libs/agents/saf/imma/imma_proc.c            |    34 +-
 osaf/libs/common/immsv/immpbe_dump.cc            |   148 ++++-
 osaf/libs/common/immsv/immsv_evt.c               |    55 +-
 osaf/libs/common/immsv/include/immpbe_dump.hh    |    10 +-
 osaf/libs/common/immsv/include/immsv_api.h       |    21 +-
 osaf/libs/common/immsv/include/immsv_evt.h       |    18 +-
 osaf/libs/common/immsv/include/immsv_evt_model.h |     4 +
 osaf/services/saf/immsv/immd/immd_amf.c          |     5 +-
 osaf/services/saf/immsv/immd/immd_cb.h           |     8 +-
 osaf/services/saf/immsv/immd/immd_db.c           |     3 +
 osaf/services/saf/immsv/immd/immd_evt.c          |    82 ++-
 osaf/services/saf/immsv/immd/immd_main.c         |    51 +-
 osaf/services/saf/immsv/immd/immd_mbcsv.c        |     2 +-
 osaf/services/saf/immsv/immd/immd_proc.c         |   229 +++++++-
 osaf/services/saf/immsv/immd/immd_proc.h         |     3 +-
 osaf/services/saf/immsv/immd/immd_sbevt.c        |    23 +-
 osaf/services/saf/immsv/immloadd/imm_loader.cc   |   323 ++++++++-
 osaf/services/saf/immsv/immloadd/imm_loader.hh   |     8 +-
 osaf/services/saf/immsv/immloadd/imm_pbe_load.cc |   224 ++++++-
 osaf/services/saf/immsv/immnd/ImmModel.cc        |   569 ++++++++++++++---
 osaf/services/saf/immsv/immnd/ImmModel.hh        |    24 +-
 osaf/services/saf/immsv/immnd/ImmSearchOp.cc     |     5 -
 osaf/services/saf/immsv/immnd/immnd_cb.h         |    10 +-
 osaf/services/saf/immsv/immnd/immnd_evt.c        |   569 ++++++++++++++++--
 osaf/services/saf/immsv/immnd/immnd_init.h       |    17 +-
 osaf/services/saf/immsv/immnd/immnd_main.c       |     3 +-
 osaf/services/saf/immsv/immnd/immnd_proc.c       |   317 ++++++++-
 osaf/services/saf/immsv/immpbed/immpbe.cc        |    77 +-
 osaf/services/saf/immsv/immpbed/immpbe.hh        |     4 +
 osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |  2021 
++++++++++++++++++++++++++++++++++++++++++++++++----------------
 32 files changed, 3926 insertions(+), 949 deletions(-)


Testing Commands:
-----------------
Any test exercising persistent imm writes is relevant.



Testing, Expected Results:
--------------------------
I have not yet fixed the problem pointed out by Neelakanta in the pre-review.
This is where the slave PBE has problems rejoining as applier due to 
interference
with on-going CCBs. This is not a robustness or consistency problem. But it 
could
prolong the non persistent writable state for imm. 

I will provide that fix either in conjunction with adjustments to these patches 
from
review commensts or send it out in a separate review request.

Most important is to test non-interference with regular PBE. But that will also
be done by default I guess.


Conditions of Submission:
-------------------------
Ack from Neel/Oracle.
Preferrably before Monday Oct 28.


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      n          n
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to