Hi AndersBj,
Reviewed the first version of 2PBE patch. following are the comments:
1. while applying the patch is failed with one rejection:
1 out of 13 hunks FAILED -- saving rejects to file
osaf/services/saf/immsv/immnd/immnd_evt.c.rej
The #355 LOG_NO anf the if condition is changed in the #355. The
following needs to be removed from the patch:
@@ -3301,7 +3304,7 @@ static SaAisErrorT immnd_fevs_local_chec
immnd_evt_destroy(&frwrd_evt, SA_FALSE, __LINE__);
if ((error != SA_AIS_OK) && (error != SA_AIS_ERR_NO_BINDINGS)) {
- LOG_NO("Precheck of fevs message failed ERROR:%u", error);
+ LOG_NO("Precheck of fevs message type %u failed
ERROR:%u", frwrd_evt.info.immnd.type, error);
}
TRACE_LEAVE();
2. Typo, ctgrl would have be ctrl:
+ node accpet message for each SC. Subsequent may have
ctgrl->canBeCoord
3. General comments:
a. with this patch both controllers must have any one of the files
like imm.db/immdb.nodeID/imm.xml for 2PBE configuration.
b. If all the parameters are same why is the remote IMMND is
choosen as co-ordinator?
4. SC-1 has imm.db and SC-2 has imm.xml. Both the nodes 2PBE is configured.
case 1:
a. SC-1(slot-3) is LOADING_SERVER and SC-2(slot-4) is LOADING_CLIENT
b. SC-2 went for preload state.
c. SC-2 after sending parameters to IMMD called AdminownerFinalize.
Adminownerfinalize got timeout
Jun 13 17:51:54 Slot-4 osafimmloadd: WA Failed on
saImmOmAdminOwnerFinalize (5)
Jun 13 17:51:52 Slot-3 osafimmnd[29960]: NO ERR_BAD_HANDLE: Admin
owner 2 does not exist
d. The adminownerFinalize event is discarded because the SC-2 IMMND
is not in maccepted state.
Jun 13 17:51:44.419650 osafimmnd [29784:immnd_evt.c:8029] T2
DISCARDING msg no:6
Jun 13 17:51:44.419670 osafimmnd [29784:immnd_evt.c:8035] TR ABT
cb->highestProcessed:5
Jun 13 17:51:44.419690 osafimmnd [29784:immnd_evt.c:8037] TR ABT
cb->highestProcessed INCREMENTED TO:6
case 2:
If SC-1 is started first and SC-2 is started next
a. starting of SC-1 immediatly crestes adminowner at that time sc-2
is not there.
b. when AdminOwnerfinalize is called for the preload in SC-1, at
the SC-2 will give error as bad_handle.
Jun 13 18:40:44 Slot-4 osafimmd[1655]: NO IMMND coord at 2010f
Jun 13 18:40:44 Slot-4 osafimmnd[1671]: NO ERR_BAD_HANDLE: Admin
owner 1 does not exist
e. ERR_BAD_HANDLE is because, in the when the node comes up,
immediatly admin
5. SC-1 has imm.db and SC-2 has imm.db with 2PBE configured.
In this case also the message is getting discarded, but the print is
not there to syslog because in this case the SC-2 reads from imm.db.
6. In the 2PBE case the checking of immnd cofiguration file must be
bypassed. Having different imm.xml, imm.db and directory names are valid
in 2PBE.
/Neel.
On Thursday 30 May 2013 06:26 PM, Anders Bjornerstedt wrote:
> Summary: IMM: Prototype for 2PBE loading (test patch) [#21]
> Review request for Trac Ticket(s): 21
> Peer Reviewer(s): Neel, Zoran
> Pull request to:
> Affected branch(es): none
> Development branch:
>
> --------------------------------
> Impacted area Impact y/n
> --------------------------------
> Docs n
> Build system n
> RPM/packaging n
> Configuration files n
> Startup scripts n
> SAF services y
> OpenSAF services n
> Core libraries n
> Samples n
> Tests n
> Other n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
>
> changeset 617845e0e935da9e5968e4b85673902aedd06642
> Author: Anders Bjornerstedt <[email protected]>
> Date: Thu, 30 May 2013 14:08:10 +0200
>
> IMM: Prototype for 2PBE loading (test patch) [#21]
>
> This is a testpatch containing a first version of the 2PBE loading
> mechanism. This patch goes on top of the patch for [#17] the file
> configuration consistency checking. See separate review request mail for
> instructions. This patch will not be pushed. The intent with this patch
> is
> to allow testing and obtaining feedback on the 2PBE loading.
>
> This is the "separate review request mail".
> To test 2PBE loading do as folows:
>
> 1) Ensure that the patch for [#17] and this patch apply successfully on the
> default branch.
> 2) Ensure that opensaf builds with configure for --enable-imm-pbe.
> 3) Verify that opensaf starts and loads without pbe configured in immnd.conf.
> 4) Configure immnd.conf for regular PBE.
> 5) Verify that opensaf starts and works normally for regular PBE. Stop
> opensaf.
> 6) Configure immd.conf (not immnd.conf) for 2PBE by commenting in the
> 'export IMMSV_PEER_SC_MAX_WAIT=30' environment variable.
> 7) Start opensaf.
>
> You should now see loading arbitration being done by the active IMMD.
> This consists of ordering a "preload" at both SC1 and SC2 concurrently.
> That preload will probe the filesystem; determine which file to be loaded
> from locally; scan that file (xml or sqlite) for the parameters used in
> loading arbitration and fially send the paramters to the IMMD via an
> admin-operation.
> When the IMMD has received the loading arbitration parameters from both
> SCs, it will compare them and decide which SC to perform the loading.
> The IMMD then elects the IMMND coord to be at the SC that "won" the
> arbitration contest. Loading is then done in the normal way.
>
> After loading has completed, a PBE will be started only at the coord
> IMMND. Thus this patch does not provide the logic for full 2PBE and
> in particular not the 2 part of the PBE dumping.
> So after loading, the system works almost as normal 1PBE, with one
> difference. The file name used by the immpbedd process is not just
> the regular "imm.db" file. The file name used is the configued
> file name (imm.db) with the processor id appended as a suffix.
> Thus from SC1 you will see 'imm.db.2010f' and from SC2 you will see
> 'imm.db.2020f'.
> Of course if you run this on a shared file system then you may see both
> files. One reason for appending the processor id to the pbe file is to allow
> testing of 2PBE on systems currently prepared only for 1PBE. Thus 2PBE
> should work also when the PBE files are placed on a shared file system.
> It would of course be silly to have such a configuration in production.
> But for testing it may be convenient.
>
> One danger has to be pointed out with testing of 2PBE loading when not
> having the 2PBE dumping in place, is that you have to be very carefull
> with failover/switchover of SC or failover of immnd-coord.
> When the immnd-coord moves from one SC ot the other in 1PBE, it tries to
> avoid regenerating the imm.db file and instead re-attaches to the pre
> existing imm.db file. With this incomplete 2PBE setup however, the PBE
> will indeed failover, but if it re-attaches it will re-attach to the
> imm.db.suffix for that matches the SC it is executing on.
> The imm.db.suffix file it re-attaches to may be completely out of sync
> since any CCBS or other persistent writes done prior to the failover would
> only have reached the other imm.db.suffix file.
> So *if* you are going to perform any test that implies a failover of immnd
> coord, then you must remover or rename the imm.db.suffix file at SC where
> the new immnd coord (and immpbedd) will start. This will force the immpbedd
> to regenerate the file.
>
> When testing loading you may also experiment with updating the epoch or
> ccb-id in the sqlite file or the xml file.
>
> The file chosen for pre-loading and loading is in order of precedence:
> 1) imm.db.<suffix>
> 2) imm.db
> 3) imm.xml
>
> If it cant find opne file or fails to open it, it tries the next.
> But if it succeeds in opening it and preloading starts but fails,
> then there is no attempt to look for the next file. Instead such
> a failure is regarded as a corrupt file setup.
>
> The loading arbitration done at the IMMD is based on:
> 1) epoch
> 2) regular ccb-id.
> 3) weak-ccb-id (a fake ccb-id for PRT updates and class updates).
> 4) regular commit-time.
> 5) weak commit-time.
>
> The first level with a difference will decide.
> If there is no difference (would be the normal case in a well functioning
> system),
> then it arbitrarily chooses the coord at the standby SC (offloads the active
> SC
> a tiny bit).
>
> Please do not hesitate to ask questions.
> The loading paramters may be extended a bit later.
>
>
> Complete diffstat:
> ------------------
> osaf/libs/agents/saf/imma/imma_proc.c | 33 ++++++-
> osaf/libs/common/immsv/immsv_evt.c | 55 ++++++++++++-
> osaf/libs/common/immsv/include/immsv_api.h | 2 +-
> osaf/libs/common/immsv/include/immsv_evt.h | 13 +++-
> osaf/services/saf/immsv/immd/immd_amf.c | 5 +-
> osaf/services/saf/immsv/immd/immd_cb.h | 6 +-
> osaf/services/saf/immsv/immd/immd_db.c | 2 +
> osaf/services/saf/immsv/immd/immd_evt.c | 82 +++++++++++++++++++-
> osaf/services/saf/immsv/immd/immd_main.c | 40 ++++++++++-
> osaf/services/saf/immsv/immd/immd_proc.c | 229
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> osaf/services/saf/immsv/immd/immd_proc.h | 3 +-
> osaf/services/saf/immsv/immd/immd_sbevt.c | 23 +++++-
> osaf/services/saf/immsv/immloadd/imm_loader.cc | 260
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
> osaf/services/saf/immsv/immloadd/imm_loader.hh | 6 +-
> osaf/services/saf/immsv/immloadd/imm_pbe_load.cc | 190
> ++++++++++++++++++++++++++++++++++++++++++++++--
> osaf/services/saf/immsv/immnd/ImmModel.cc | 53 ++++++++++++-
> osaf/services/saf/immsv/immnd/immnd_cb.h | 2 +
> osaf/services/saf/immsv/immnd/immnd_evt.c | 121
> ++++++++++++++++++++++++++++++-
> osaf/services/saf/immsv/immnd/immnd_main.c | 2 +-
> osaf/services/saf/immsv/immnd/immnd_proc.c | 91
> +++++++++++++++++++---
> osaf/services/saf/immsv/immpbed/immpbe_daemon.cc | 19 +++-
> 21 files changed, 1116 insertions(+), 121 deletions(-)
>
>
> Testing Commands:
> -----------------
> <<LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES>>
>
>
> Testing, Expected Results:
> --------------------------
> <<PASTE COMMAND OUTPUTS / TEST RESULTS>>
>
>
> Conditions of Submission:
> -------------------------
> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
>
>
> Arch Built Started Linux distro
> -------------------------------------------
> mips n n
> mips64 n n
> x86 n n
> x86_64 n n
> powerpc n n
> powerpc64 n n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
> that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
> (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
> Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
> like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
> cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
> too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
> Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
> commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
> of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
> comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
> the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
> for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
> do not contain the patch that updates the Doxygen manual.
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel