Hi Andersbj,

while testing I am not able to proceed:

case 1:
2PBE is not configured and PBE is also not configured. getting the 
following error, while starting the opensaf

Nov  8 15:38:59 Slot-3 kernel: TIPC: Own node address <1.1.1>, network 
identity 1234
Nov  8 15:38:59 Slot-3 kernel: TIPC: Enabled bearer <eth:eth1>, 
discovery domain <1.1.0>, priority 10
Nov  8 15:38:59 Slot-3 osafrded[4713]: Started
Nov  8 15:39:01 Slot-3 osafrded[4713]: NO Peer not available => Active role
Nov  8 15:39:01 Slot-3 osaffmd[4727]: Started
Nov  8 15:39:01 Slot-3 osafimmd[4742]: Started
Nov  8 15:39:01 Slot-3 osafimmnd[4757]: Started
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO New IMMND process is on ACTIVE 
Controller at 2010f
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO First SC IMMND (OpenSAF 4.4 or 
later) attached 2010f
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO First IMMND at SC to attach is 
NOT configured for PBE
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO First IMMND on SC found at 
2010f this IMMD at 2010f. Cluster is loading, *not* 2PBE => designating 
that IMMND as coordinator
Nov  8 15:39:01 Slot-3 osafimmnd[4757]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Nov  8 15:39:01 Slot-3 osafimmnd[4757]: NO This IMMND is now the NEW Coord
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Nov  8 15:39:04 Slot-3 osafimmd[4742]: NO Successfully announced 
loading. New ruling epoch:1
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_SERVER
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: NO NODE STATE-> IMM_NODE_LOADING
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: ER LOADING APPARENTLY FAILED 
status:0
Nov  8 15:39:04 Slot-3 osafimmd[4742]: ER ******** LOADING FAILED. 
File(s) possibly missing, inaccessible or corrupt .. ? *********
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Failed   DESC:IMMND
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Going for recovery
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Trying To RESPAWN 
/usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Sending SIGKILL to IMMND, pid=4749
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: ER IMMND - Periodic server job 
failed


case 2:
2PBE is configured and PBE is configured and no imm.db is present:

Nov  8 15:42:10 Slot-4 kernel: TIPC: Enabled bearer <eth:eth1>, 
discovery domain <1.1.0>, priority 10
Nov  8 15:42:10 Slot-4 osafrded[3983]: Started
Nov  8 15:42:12 Slot-4 osafrded[3983]: NO Peer not available => Active role
Nov  8 15:42:12 Slot-4 osaffmd[3997]: Started
Nov  8 15:42:12 Slot-4 osafimmd[4012]: Started
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO 2PBE configured with 
IMMSV_PEER_SC_MAX_WAIT: 30 seconds
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: Started
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO New IMMND process is on ACTIVE 
Controller at 2020f
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO Extended intro from node 2020f
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO First SC IMMND (OpenSAF 4.4 or 
later) attached 2020f
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO IMMND on SC found at 2020f 
this IMMD at 2020f. Cluster is loading. 2PBE configured => Wait.
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO 2PBE startup arbitration 
initiated from IMMD
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO 2PBE configured, 
IMMSV_PBE_FILE_SUFFIX:.2020f (preload)
Nov  8 15:42:12 Slot-4 osafimmloadd: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0xffffffff
Nov  8 15:42:12 Slot-4 osafimmloadd: NO 2PBE pre-load starting
Nov  8 15:42:12 Slot-4 osafimmloadd: NO IMMSV_PBE_FILE is defined 
(imm.db.2020f) check it for existence and SaImmRepositoryInitModeT
Nov  8 15:42:12 Slot-4 osafimmloadd: IN File '/etc/opensaf/imm.db.2020f' 
is not accessible for read/write, cause:No such file or directory
Nov  8 15:42:12 Slot-4 osafimmloadd: WA Could not open 
repository:imm.db.2020f
Nov  8 15:42:12 Slot-4 osafimmloadd: NO Trying without suffix
Nov  8 15:42:12 Slot-4 osafimmloadd: IN File '/etc/opensaf/imm.db' is 
not accessible for read/write, cause:No such file or directory
Nov  8 15:42:12 Slot-4 osafimmloadd: NO 2PBE: Pre-loading from XML file 
imm.xml at /etc/opensaf
Nov  8 15:42:13 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:1089 
new timeout: 28911 msecs
Nov  8 15:42:14 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:2106 
new timeout: 27894 msecs
Nov  8 15:42:15 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:3127 
new timeout: 26873 msecs
Nov  8 15:42:16 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:4144 
new timeout: 25856 msecs
Nov  8 15:42:17 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:5161 
new timeout: 24839 msecs
Nov  8 15:42:18 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:6175 
new timeout: 23825 msecs
Nov  8 15:42:19 Slot-4 osafimmloadd: WA Failed to create the class 
OsafImmPbeRt err:6
*Nov  8 15:42:19 Slot-4 osafimmloadd: ER Failed to create the class 
OsafImmPbeRt - exiting*
Nov  8 15:42:19 Slot-4 osafimmnd[4027]: ER Prealoader failed to obtain 
stats.
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Failed   DESC:IMMND
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Going for recovery
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Trying To RESPAWN 
/usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Sending SIGKILL to IMMND, pid=4018


/Neel.

On Tuesday 05 November 2013 09:50 PM, Anders Bjornerstedt wrote:
> Summary: IMM: 2PBE new version of the patch stack [#21]
> Review request for Trac Ticket(s): 21
> Peer Reviewer(s): Neel
> Pull request to:
> Affected branch(es): default (4.4)
> Development branch:
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
> The patch stack for the 2PBE enhancement has been updated. This is just a new
> review erquest for enhancement #21.
>
> The first patch (loading) has been adjusted to apply cleanly on top of 
> changeset:
>
>
>      changeset:   4588:393ca121ca7c
>      user:        Anders Bjornerstedt <[email protected]>
>      date:        Mon Nov 04 18:29:25 2013 +0100
>      summary:     IMM: IMMD file verification made upgrade safe [#596]
>
> The second and third patches are unchanged. Three additional fix patches 
> follow on top
> of that.
>
> changeset 0c554fd3174b67eac5c599af6d6d2cc97b126e51
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Mon, 28 Oct 2013 10:25:27 +0100
>
>       IMM: 2PBE patch-1 (loading) [#21]
>
>       This patch contains the 2PBE loading mechanism, needed to support 2PBE. 
> The
>       IMMD's will detect 2PBE loading by the IMMSV_2PBE_PEER_SC_MAX_WAIT
>       environment variable being set in the immd.conf file. The active IMMD 
> will
>       order each SC IMMND to execute a "preload" probing the SC local 
> filesystem
>       for the file state that would be loaded to the cluster if that IMMND was
>       chosen as coord. The IMMND sends these stats to the active IMMD.
>
>       The IMMD will wait for the IMMNDs at *both* SCs to complete this task 
> and
>       then determine which SC has the apparently latest file state. The IMMND 
> at
>       that SC will then be chosen as IMMND coord. Actual loading then 
> proceeds in
>       the same way as for regular 1PBE. The IMMSV_2PBE_PEER_SC_MAX_WAIT is by
>       default 30 seconds. This value should be high enough to make it very
>       unlikely that the active IMMD is forced choose loader when only a 
> single SC
>       IMMND has joined. If that happens, then the risk is that the cluster 
> restart
>       will be done *not* using the latest persistent imm state, effectively
>       rewinding the imm state. (Note the same thing willl happen with regular 
> 1PBE
>       based on a shared filesystem (DRBD) if the one SC fails to come up in 
> time
>       to join the DRBD sync protocol. The corresponding DRBD timeout is on the
>       order of 20 seconds.
>
>       When loading has completed, additional 2PBE functiolaity will start two
>       PBEs, one at each SC. That functionality is delivered in subsequent 
> patches.
>
> changeset 0f3cc59f1eb8031034ac41485cd75438ec17c4b1
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Fri, 11 Oct 2013 01:17:44 +0200
>
>       IMM: 2PBE patch-2 (dumping) [#21]
>
>        This patch contains the 2PBE dumping mechanism, needed to support 
> 2PBE. A
>       PBE process is started by the IMMND at each SC, not just the IMMND 
> coord.
>
>       The PBE colocated with the IMMND coord, called the primary PBE, is 
> still the
>       coordinator for transaction commits (for CCBs and PRT operations and 
> class-
>       create/deletes). The primary PBE (sometimes called the A-side PBE) works
>       very much in the same way as the regular single PBE does in 1-PBE. The 
> PBE
>       colocated with the SC resident but non-coord IMMND is called the slave 
> PBE
>       (sometimes called the B-side PBE).
>
>       With 2PBE, *both* PBEs must be available for the imm to be persistent-
>       writable. If one or boths PBEs are unavailable (or unresponsive) then
>       persistent writes (ccbs and PRT operations) will fail.
>
>       In 2PBE, a restarted PBE will more often need to regenerate the sqlite 
> file.
>       On the other hand, regeneration of the sqlite file should be faster in 
> 2PBE
>       than in regular 1PBE because the file is typically placed on a local 
> file
>       system.
>
>       A subsequent patch will provide a mechhanism for allowing 1-safe-2PBE. 
> This
>       will allow the imm to open up for persistent writes when only one of 
> the two
>       PBEs are available. THis will only be allowed when and during the 
> absence of
>       an SC. As soon as the other SC rejoins the IMM has to re enter the non-
>       persistent-writable state.
>
> changeset d99a312f527c8fb701149ff840dce1bffe416d75
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Fri, 11 Oct 2013 02:08:51 +0200
>
>       IMM: 2PBE patch-3 (1safe2pbe) [#21]
>
>       This patch contains the 2PBE 1safe2Pbe mechanism. This mechanism allows 
> an
>       OpenSAF cluster to open up for persistent writes using only one of the 
> two
>       PBEs - temporarily.
>
>       This is only intended to be used as an emergency action when one SC is 
> long
>       term unavailable, e.g. hardware problems. As soon as the other SC 
> returns,
>       the IMM has to re-enter normal 2-safe 2PBE and reject persistent writes
>       until the slave PBE has synced (regenerated its sqlite file) and 
> rejoined
>       the cluster.
>
>       The 1safe2PBE state is entered by the administrative opeation:
>
>        immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:8 \
>       opensafImm=opensafImm,safApp=safImmService
>
>       It is exited either automatically by a rejoined SC or by an explicit
>       administrative opertion:
>
>        immadm -o 2 -p opensafImmNostdFlags:SA_UINT32_T:8 \
>       opensafImm=opensafImm,safApp=safImmService
>
> changeset d4b720966e5e0769130163a21c5e9642d9e14864
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Tue, 05 Nov 2013 16:41:05 +0100
>
>       IMM: 2PBE patch-4 (Fix for si-swap problem) [#21]
>
>        With the 2PBE patches applied and 2PBE configured, if si_swap:
>
>        immadm -o 7 safSi=SC-2N,safApp=OpenSAF
>
>       is attempted twice, then the second time will cause the new active SC to
>       reboot. This was caused by the cb->is_loading variable being 
> initialized to
>       true at both active and standby SC, whwn it should only have been set to
>       true in the active. When true also in the standby it is not set to 
> false by
>       loading completed. When the standby becomes active, it will not mbcp 
> fevs
>       messages to new standby, despite that loading was done a long time ago. 
> Next
>       si swap causes the new new active to start sending fevs messages way 
> below
>       the expected fevs number. This causes the new new standby to crash.
>
> changeset ec4d0f6e1b90ab7dae6d2ab923aafe03f90c3f71
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Tue, 05 Nov 2013 16:52:49 +0100
>
>       IMM: 2PBE patch-5 (Fix loading retry problem) [#21]
>
>        If after loadin-arbitration is done, loading is started but fails (for
>       example due to a corrupt sqlite file) then the IMMNDs are restarted but 
> not
>       IMMDs. The already collected loading arbitration info in the active 
> IMMD is
>       not cleared and in the next loading attempt the loading arbitration will
>       only wait for the stats from one of the SCs, then find it already has 
> stats
>       from both SC's but actually one of them will be stats from the previous 
> load
>       (!). This can result in incorrect arbitration. The file for which 
> loading
>       fails will have been moved to imm.db.xxxxx.failed and thus not used for
>       arbitration. The fallback file is typically much older, but that may be
>       masked by the old preload stats for that SC. The preload stats need to 
> be
>       cleared here when and if loading is restarted.
>
> changeset 03064f5a3812a865e1c59abf59e1b51ace9d397e
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Tue, 05 Nov 2013 17:00:20 +0100
>
>       IMM: 2PBE patch-6 (PBE slave can re-attach when empty CCBs exist) [#21]
>
>        When a 2PE system is up and running with ccbs being generated, if one 
> SC is
>       rebooted, then after the SC has synced imm-ram, the slave pbe typically 
> has
>       trouble in being allowed to generate its imm.db.xxxxx file. It keeps 
> getting
>       rejected due to active ccbs. There realy should not be any active ccbs
>       allowed here because the sync of the returned SC would only have started
>       when there are no active ccbs and once sync is finished the imm should 
> still
>       not be persistent writable. THe only problem hewre is that empty CCBs 
> are
>       allowed to be created. Thus the condiftion for allowing the slave to
>       generate its imm.db.xxxx file needs to be relaxed to allow empty CCBs.
>
>
> Complete diffstat:
> ------------------
>   osaf/libs/agents/saf/imma/imma_db.c              |     4 +-
>   osaf/libs/agents/saf/imma/imma_oi_api.c          |     4 +-
>   osaf/libs/agents/saf/imma/imma_proc.c            |    34 +-
>   osaf/libs/common/immsv/immpbe_dump.cc            |   148 +++-
>   osaf/libs/common/immsv/immsv_evt.c               |    55 +-
>   osaf/libs/common/immsv/include/immpbe_dump.hh    |    10 +-
>   osaf/libs/common/immsv/include/immsv_api.h       |    21 +-
>   osaf/libs/common/immsv/include/immsv_evt.h       |    18 +-
>   osaf/libs/common/immsv/include/immsv_evt_model.h |     4 +
>   osaf/services/saf/immsv/immd/immd_amf.c          |     5 +-
>   osaf/services/saf/immsv/immd/immd_cb.h           |     8 +-
>   osaf/services/saf/immsv/immd/immd_db.c           |     5 +
>   osaf/services/saf/immsv/immd/immd_evt.c          |    82 ++-
>   osaf/services/saf/immsv/immd/immd_main.c         |    51 +-
>   osaf/services/saf/immsv/immd/immd_mbcsv.c        |     2 +-
>   osaf/services/saf/immsv/immd/immd_proc.c         |   241 ++++++-
>   osaf/services/saf/immsv/immd/immd_proc.h         |     3 +-
>   osaf/services/saf/immsv/immd/immd_sbevt.c        |    47 +-
>   osaf/services/saf/immsv/immloadd/imm_loader.cc   |   323 +++++++-
>   osaf/services/saf/immsv/immloadd/imm_loader.hh   |     8 +-
>   osaf/services/saf/immsv/immloadd/imm_pbe_load.cc |   224 +++++-
>   osaf/services/saf/immsv/immnd/ImmModel.cc        |   583 ++++++++++++---
>   osaf/services/saf/immsv/immnd/ImmModel.hh        |    26 +-
>   osaf/services/saf/immsv/immnd/ImmSearchOp.cc     |     5 -
>   osaf/services/saf/immsv/immnd/immnd_cb.h         |    10 +-
>   osaf/services/saf/immsv/immnd/immnd_evt.c        |   582 ++++++++++++++--
>   osaf/services/saf/immsv/immnd/immnd_init.h       |    19 +-
>   osaf/services/saf/immsv/immnd/immnd_main.c       |     3 +-
>   osaf/services/saf/immsv/immnd/immnd_proc.c       |   319 +++++++-
>   osaf/services/saf/immsv/immpbed/immpbe.cc        |    77 +-
>   osaf/services/saf/immsv/immpbed/immpbe.hh        |     4 +
>   osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |  2021 
> ++++++++++++++++++++++++++++++++++++++++++--------------
>   32 files changed, 3969 insertions(+), 977 deletions(-)
>
>
> Testing Commands:
> -----------------
> 2PBE is enabled by commenting in the immd.conf environment variable:
>
>       export IMMSV_2PBE_PEER_SC_MAX_WAIT=30
>
>
>
> Testing, Expected Results:
> --------------------------
> 2PBE should work, incrementally dumping all persistent data changes to
> sqlite files at both sC-1 and SC-2.
>
>
> Conditions of Submission:
> -------------------------
> Ack from Neel
>
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      n          n
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to