Summary: IMM: 2PBE Review request for Trac Ticket(s): #21 Peer Reviewer(s): Neel Pull request to: Affected branch(es): default(4.4) Development branch:
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- changeset 61de45652342df8ea99de81aada2758a120a1613 Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> Date: Thu, 10 Oct 2013 21:27:32 +0200 IMM: 2PBE patch-1 (loading) [#21] This patch contains the 2PBE loading mechanism, needed to support 2PBE. The IMMD's will detect 2PBE loading by the IMMSV_2PBE_PEER_SC_MAX_WAIT environment variable being set in the immd.conf file. The active IMMD will order each SC IMMND to execute a "preload" probing the SC local filesystem for the file state that would be loaded to the cluster if that IMMND was chosen as coord. The IMMND sends these stats to the active IMMD. The IMMD will wait for the IMMNDs at *both* SCs to complete this task and then determine which SC has the apparently latest file state. The IMMND at that SC will then be chosen as IMMND coord. Actual loading then proceeds in the same way as for regular 1PBE. The IMMSV_2PBE_PEER_SC_MAX_WAIT is by default 30 seconds. This value should be high enough to make it very unlikely that the active IMMD is forced choose loader when only a single SC IMMND has joined. If that happens, then the risk is that the cluster restart will be done *not* using the latest persistent imm state, effectively rewinding the imm state. (Note the same thing willl happen with regular 1PBE based on a shared filesystem (DRBD) if the one SC fails to come up in time to join the DRBD sync protocol. The corresponding DRBD timeout is on the order of 20 seconds. When loading has completed, additional 2PBE functiolaity will start two PBEs, one at each SC. That functionality is delivered in subsequent patches. changeset 8e597befaf60b26aa840a7c7bad1b4b2d220f752 Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> Date: Fri, 11 Oct 2013 01:17:44 +0200 IMM: 2PBE patch-2 (dumping) [#21] This patch contains the 2PBE dumping mechanism, needed to support 2PBE. A PBE process is started by the IMMND at each SC, not just the IMMND coord. The PBE colocated with the IMMND coord, called the primary PBE, is still the coordinator for transaction commits (for CCBs and PRT operations and class- create/deletes). The primary PBE (sometimes called the A-side PBE) works very much in the same way as the regular single PBE does in 1-PBE. The PBE colocated with the SC resident but non-coord IMMND is called the slave PBE (sometimes called the B-side PBE). With 2PBE, *both* PBEs must be available for the imm to be persistent- writable. If one or boths PBEs are unavailable (or unresponsive) then persistent writes (ccbs and PRT operations) will fail. In 2PBE, a restarted PBE will more often need to regenerate the sqlite file. On the other hand, regeneration of the sqlite file should be faster in 2PBE than in regular 1PBE because the file is typically placed on a local file system. A subsequent patch will provide a mechhanism for allowing 1-safe-2PBE. This will allow the imm to open up for persistent writes when only one of the two PBEs are available. THis will only be allowed when and during the absence of an SC. As soon as the other SC rejoins the IMM has to re enter the non- persistent-writable state. changeset 9efd67f95ad153edf224af41e999352853ae1a3f Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> Date: Fri, 11 Oct 2013 02:08:51 +0200 IMM: 2PBE patch-3 (1safe2pbe) [#21] This patch contains the 2PBE 1safe2Pbe mechanism. This mechanism allows an OpenSAF cluster to open up for persistent writes using only one of the two PBEs - temporarily. This is only intended to be used as an emergency action when one SC is long term unavailable, e.g. hardware problems. As soon as the other SC returns, the IMM has to re-enter normal 2-safe 2PBE and reject persistent writes until the slave PBE has synced (regenerated its sqlite file) and rejoined the cluster. The 1safe2PBE state is entered by the administrative opeation: immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:8 \ opensafImm=opensafImm,safApp=safImmService It is exited either automatically by a rejoined SC or by an explicit administrative opertion: immadm -o 2 -p opensafImmNostdFlags:SA_UINT32_T:8 \ opensafImm=opensafImm,safApp=safImmService Complete diffstat: ------------------ osaf/libs/agents/saf/imma/imma_db.c | 4 +- osaf/libs/agents/saf/imma/imma_oi_api.c | 4 +- osaf/libs/agents/saf/imma/imma_proc.c | 34 +- osaf/libs/common/immsv/immpbe_dump.cc | 148 ++++- osaf/libs/common/immsv/immsv_evt.c | 55 +- osaf/libs/common/immsv/include/immpbe_dump.hh | 10 +- osaf/libs/common/immsv/include/immsv_api.h | 21 +- osaf/libs/common/immsv/include/immsv_evt.h | 18 +- osaf/libs/common/immsv/include/immsv_evt_model.h | 4 + osaf/services/saf/immsv/immd/immd_amf.c | 5 +- osaf/services/saf/immsv/immd/immd_cb.h | 8 +- osaf/services/saf/immsv/immd/immd_db.c | 3 + osaf/services/saf/immsv/immd/immd_evt.c | 82 ++- osaf/services/saf/immsv/immd/immd_main.c | 51 +- osaf/services/saf/immsv/immd/immd_mbcsv.c | 2 +- osaf/services/saf/immsv/immd/immd_proc.c | 229 +++++++- osaf/services/saf/immsv/immd/immd_proc.h | 3 +- osaf/services/saf/immsv/immd/immd_sbevt.c | 23 +- osaf/services/saf/immsv/immloadd/imm_loader.cc | 323 ++++++++- osaf/services/saf/immsv/immloadd/imm_loader.hh | 8 +- osaf/services/saf/immsv/immloadd/imm_pbe_load.cc | 224 ++++++- osaf/services/saf/immsv/immnd/ImmModel.cc | 569 ++++++++++++++--- osaf/services/saf/immsv/immnd/ImmModel.hh | 24 +- osaf/services/saf/immsv/immnd/ImmSearchOp.cc | 5 - osaf/services/saf/immsv/immnd/immnd_cb.h | 10 +- osaf/services/saf/immsv/immnd/immnd_evt.c | 569 ++++++++++++++++-- osaf/services/saf/immsv/immnd/immnd_init.h | 17 +- osaf/services/saf/immsv/immnd/immnd_main.c | 3 +- osaf/services/saf/immsv/immnd/immnd_proc.c | 317 ++++++++- osaf/services/saf/immsv/immpbed/immpbe.cc | 77 +- osaf/services/saf/immsv/immpbed/immpbe.hh | 4 + osaf/services/saf/immsv/immpbed/immpbe_daemon.cc | 2021 ++++++++++++++++++++++++++++++++++++++++++++++++---------------- 32 files changed, 3926 insertions(+), 949 deletions(-) Testing Commands: ----------------- Any test exercising persistent imm writes is relevant. Testing, Expected Results: -------------------------- I have not yet fixed the problem pointed out by Neelakanta in the pre-review. This is where the slave PBE has problems rejoining as applier due to interference with on-going CCBs. This is not a robustness or consistency problem. But it could prolong the non persistent writable state for imm. I will provide that fix either in conjunction with adjustments to these patches from review commensts or send it out in a separate review request. Most important is to test non-interference with regular PBE. But that will also be done by default I guess. Conditions of Submission: ------------------------- Ack from Neel/Oracle. Preferrably before Monday Oct 28. Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel