Ok, looking into it.

Thanks
/AndersBj

________________________________
From: Neelakanta Reddy [mailto:[email protected]]
Sent: den 8 november 2013 11:23
To: Anders Björnerstedt
Cc: [email protected]
Subject: Re: [PATCH 0 of 6] Review Request for 2PBE - updated patch stack

Hi Andersbj,

while testing I am not able to proceed:

case 1:
2PBE is not configured and PBE is also not configured. getting the following 
error, while starting the opensaf

Nov  8 15:38:59 Slot-3 kernel: TIPC: Own node address <1.1.1>, network identity 
1234
Nov  8 15:38:59 Slot-3 kernel: TIPC: Enabled bearer <eth:eth1>, discovery 
domain <1.1.0>, priority 10
Nov  8 15:38:59 Slot-3 osafrded[4713]: Started
Nov  8 15:39:01 Slot-3 osafrded[4713]: NO Peer not available => Active role
Nov  8 15:39:01 Slot-3 osaffmd[4727]: Started
Nov  8 15:39:01 Slot-3 osafimmd[4742]: Started
Nov  8 15:39:01 Slot-3 osafimmnd[4757]: Started
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO New IMMND process is on ACTIVE 
Controller at 2010f
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO First SC IMMND (OpenSAF 4.4 or later) 
attached 2010f
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO First IMMND at SC to attach is NOT 
configured for PBE
Nov  8 15:39:01 Slot-3 osafimmd[4742]: NO First IMMND on SC found at 2010f this 
IMMD at 2010f. Cluster is loading, *not* 2PBE => designating that IMMND as 
coordinator
Nov  8 15:39:01 Slot-3 osafimmnd[4757]: NO SERVER STATE: IMM_SERVER_ANONYMOUS 
--> IMM_SERVER_CLUSTER_WAITING
Nov  8 15:39:01 Slot-3 osafimmnd[4757]: NO This IMMND is now the NEW Coord
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Nov  8 15:39:04 Slot-3 osafimmd[4742]: NO Successfully announced loading. New 
ruling epoch:1
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_SERVER
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: NO NODE STATE-> IMM_NODE_LOADING
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: ER LOADING APPARENTLY FAILED status:0
Nov  8 15:39:04 Slot-3 osafimmd[4742]: ER ******** LOADING FAILED. File(s) 
possibly missing, inaccessible or corrupt .. ? *********
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Failed   DESC:IMMND
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Going for recovery
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Trying To RESPAWN 
/usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1
Nov  8 15:39:04 Slot-3 opensafd[4681]: ER Sending SIGKILL to IMMND, pid=4749
Nov  8 15:39:04 Slot-3 osafimmnd[4757]: ER IMMND - Periodic server job failed


case 2:
2PBE is configured and PBE is configured and no imm.db is present:

Nov  8 15:42:10 Slot-4 kernel: TIPC: Enabled bearer <eth:eth1>, discovery 
domain <1.1.0>, priority 10
Nov  8 15:42:10 Slot-4 osafrded[3983]: Started
Nov  8 15:42:12 Slot-4 osafrded[3983]: NO Peer not available => Active role
Nov  8 15:42:12 Slot-4 osaffmd[3997]: Started
Nov  8 15:42:12 Slot-4 osafimmd[4012]: Started
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO 2PBE configured with 
IMMSV_PEER_SC_MAX_WAIT: 30 seconds
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: Started
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO New IMMND process is on ACTIVE 
Controller at 2020f
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO Extended intro from node 2020f
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO First SC IMMND (OpenSAF 4.4 or later) 
attached 2020f
Nov  8 15:42:12 Slot-4 osafimmd[4012]: NO IMMND on SC found at 2020f this IMMD 
at 2020f. Cluster is loading. 2PBE configured => Wait.
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO SERVER STATE: IMM_SERVER_ANONYMOUS 
--> IMM_SERVER_CLUSTER_WAITING
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO 2PBE startup arbitration initiated 
from IMMD
Nov  8 15:42:12 Slot-4 osafimmnd[4027]: NO 2PBE configured, 
IMMSV_PBE_FILE_SUFFIX:.2020f (preload)
Nov  8 15:42:12 Slot-4 osafimmloadd: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0xffffffff
Nov  8 15:42:12 Slot-4 osafimmloadd: NO 2PBE pre-load starting
Nov  8 15:42:12 Slot-4 osafimmloadd: NO IMMSV_PBE_FILE is defined 
(imm.db.2020f) check it for existence and SaImmRepositoryInitModeT
Nov  8 15:42:12 Slot-4 osafimmloadd: IN File '/etc/opensaf/imm.db.2020f' is not 
accessible for read/write, cause:No such file or directory
Nov  8 15:42:12 Slot-4 osafimmloadd: WA Could not open repository:imm.db.2020f
Nov  8 15:42:12 Slot-4 osafimmloadd: NO Trying without suffix
Nov  8 15:42:12 Slot-4 osafimmloadd: IN File '/etc/opensaf/imm.db' is not 
accessible for read/write, cause:No such file or directory
Nov  8 15:42:12 Slot-4 osafimmloadd: NO 2PBE: Pre-loading from XML file imm.xml 
at /etc/opensaf
Nov  8 15:42:13 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:1089 new 
timeout: 28911 msecs
Nov  8 15:42:14 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:2106 new 
timeout: 27894 msecs
Nov  8 15:42:15 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:3127 new 
timeout: 26873 msecs
Nov  8 15:42:16 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:4144 new 
timeout: 25856 msecs
Nov  8 15:42:17 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:5161 new 
timeout: 24839 msecs
Nov  8 15:42:18 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:6175 new 
timeout: 23825 msecs
Nov  8 15:42:19 Slot-4 osafimmloadd: WA Failed to create the class OsafImmPbeRt 
err:6
Nov  8 15:42:19 Slot-4 osafimmloadd: ER Failed to create the class OsafImmPbeRt 
- exiting
Nov  8 15:42:19 Slot-4 osafimmnd[4027]: ER Prealoader failed to obtain stats.
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Failed   DESC:IMMND
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Going for recovery
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Trying To RESPAWN 
/usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1
Nov  8 15:42:19 Slot-4 opensafd[3951]: ER Sending SIGKILL to IMMND, pid=4018


/Neel.

On Tuesday 05 November 2013 09:50 PM, Anders Bjornerstedt wrote:

Summary: IMM: 2PBE new version of the patch stack [#21]
Review request for Trac Ticket(s): 21
Peer Reviewer(s): Neel
Pull request to:
Affected branch(es): default (4.4)
Development branch:

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            y
 OpenSAF services        n
 Core libraries          n
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------
The patch stack for the 2PBE enhancement has been updated. This is just a new
review erquest for enhancement #21.

The first patch (loading) has been adjusted to apply cleanly on top of 
changeset:


    changeset:   4588:393ca121ca7c
    user:        Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
    date:        Mon Nov 04 18:29:25 2013 +0100
    summary:     IMM: IMMD file verification made upgrade safe [#596]

The second and third patches are unchanged. Three additional fix patches follow 
on top
of that.

changeset 0c554fd3174b67eac5c599af6d6d2cc97b126e51
Author: Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
Date:   Mon, 28 Oct 2013 10:25:27 +0100

        IMM: 2PBE patch-1 (loading) [#21]

        This patch contains the 2PBE loading mechanism, needed to support 2PBE. 
The
        IMMD's will detect 2PBE loading by the IMMSV_2PBE_PEER_SC_MAX_WAIT
        environment variable being set in the immd.conf file. The active IMMD 
will
        order each SC IMMND to execute a "preload" probing the SC local 
filesystem
        for the file state that would be loaded to the cluster if that IMMND was
        chosen as coord. The IMMND sends these stats to the active IMMD.

        The IMMD will wait for the IMMNDs at *both* SCs to complete this task 
and
        then determine which SC has the apparently latest file state. The IMMND 
at
        that SC will then be chosen as IMMND coord. Actual loading then 
proceeds in
        the same way as for regular 1PBE. The IMMSV_2PBE_PEER_SC_MAX_WAIT is by
        default 30 seconds. This value should be high enough to make it very
        unlikely that the active IMMD is forced choose loader when only a 
single SC
        IMMND has joined. If that happens, then the risk is that the cluster 
restart
        will be done *not* using the latest persistent imm state, effectively
        rewinding the imm state. (Note the same thing willl happen with regular 
1PBE
        based on a shared filesystem (DRBD) if the one SC fails to come up in 
time
        to join the DRBD sync protocol. The corresponding DRBD timeout is on the
        order of 20 seconds.

        When loading has completed, additional 2PBE functiolaity will start two
        PBEs, one at each SC. That functionality is delivered in subsequent 
patches.

changeset 0f3cc59f1eb8031034ac41485cd75438ec17c4b1
Author: Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
Date:   Fri, 11 Oct 2013 01:17:44 +0200

        IMM: 2PBE patch-2 (dumping) [#21]

         This patch contains the 2PBE dumping mechanism, needed to support 
2PBE. A
        PBE process is started by the IMMND at each SC, not just the IMMND 
coord.

        The PBE colocated with the IMMND coord, called the primary PBE, is 
still the
        coordinator for transaction commits (for CCBs and PRT operations and 
class-
        create/deletes). The primary PBE (sometimes called the A-side PBE) works
        very much in the same way as the regular single PBE does in 1-PBE. The 
PBE
        colocated with the SC resident but non-coord IMMND is called the slave 
PBE
        (sometimes called the B-side PBE).

        With 2PBE, *both* PBEs must be available for the imm to be persistent-
        writable. If one or boths PBEs are unavailable (or unresponsive) then
        persistent writes (ccbs and PRT operations) will fail.

        In 2PBE, a restarted PBE will more often need to regenerate the sqlite 
file.
        On the other hand, regeneration of the sqlite file should be faster in 
2PBE
        than in regular 1PBE because the file is typically placed on a local 
file
        system.

        A subsequent patch will provide a mechhanism for allowing 1-safe-2PBE. 
This
        will allow the imm to open up for persistent writes when only one of 
the two
        PBEs are available. THis will only be allowed when and during the 
absence of
        an SC. As soon as the other SC rejoins the IMM has to re enter the non-
        persistent-writable state.

changeset d99a312f527c8fb701149ff840dce1bffe416d75
Author: Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
Date:   Fri, 11 Oct 2013 02:08:51 +0200

        IMM: 2PBE patch-3 (1safe2pbe) [#21]

        This patch contains the 2PBE 1safe2Pbe mechanism. This mechanism allows 
an
        OpenSAF cluster to open up for persistent writes using only one of the 
two
        PBEs - temporarily.

        This is only intended to be used as an emergency action when one SC is 
long
        term unavailable, e.g. hardware problems. As soon as the other SC 
returns,
        the IMM has to re-enter normal 2-safe 2PBE and reject persistent writes
        until the slave PBE has synced (regenerated its sqlite file) and 
rejoined
        the cluster.

        The 1safe2PBE state is entered by the administrative opeation:

         immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:8 \
        opensafImm=opensafImm,safApp=safImmService

        It is exited either automatically by a rejoined SC or by an explicit
        administrative opertion:

         immadm -o 2 -p opensafImmNostdFlags:SA_UINT32_T:8 \
        opensafImm=opensafImm,safApp=safImmService

changeset d4b720966e5e0769130163a21c5e9642d9e14864
Author: Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
Date:   Tue, 05 Nov 2013 16:41:05 +0100

        IMM: 2PBE patch-4 (Fix for si-swap problem) [#21]

         With the 2PBE patches applied and 2PBE configured, if si_swap:

         immadm -o 7 safSi=SC-2N,safApp=OpenSAF

        is attempted twice, then the second time will cause the new active SC to
        reboot. This was caused by the cb->is_loading variable being 
initialized to
        true at both active and standby SC, whwn it should only have been set to
        true in the active. When true also in the standby it is not set to 
false by
        loading completed. When the standby becomes active, it will not mbcp 
fevs
        messages to new standby, despite that loading was done a long time ago. 
Next
        si swap causes the new new active to start sending fevs messages way 
below
        the expected fevs number. This causes the new new standby to crash.

changeset ec4d0f6e1b90ab7dae6d2ab923aafe03f90c3f71
Author: Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
Date:   Tue, 05 Nov 2013 16:52:49 +0100

        IMM: 2PBE patch-5 (Fix loading retry problem) [#21]

         If after loadin-arbitration is done, loading is started but fails (for
        example due to a corrupt sqlite file) then the IMMNDs are restarted but 
not
        IMMDs. The already collected loading arbitration info in the active 
IMMD is
        not cleared and in the next loading attempt the loading arbitration will
        only wait for the stats from one of the SCs, then find it already has 
stats
        from both SC's but actually one of them will be stats from the previous 
load
        (!). This can result in incorrect arbitration. The file for which 
loading
        fails will have been moved to imm.db.xxxxx.failed and thus not used for
        arbitration. The fallback file is typically much older, but that may be
        masked by the old preload stats for that SC. The preload stats need to 
be
        cleared here when and if loading is restarted.

changeset 03064f5a3812a865e1c59abf59e1b51ace9d397e
Author: Anders Bjornerstedt 
<[email protected]><mailto:[email protected]>
Date:   Tue, 05 Nov 2013 17:00:20 +0100

        IMM: 2PBE patch-6 (PBE slave can re-attach when empty CCBs exist) [#21]

         When a 2PE system is up and running with ccbs being generated, if one 
SC is
        rebooted, then after the SC has synced imm-ram, the slave pbe typically 
has
        trouble in being allowed to generate its imm.db.xxxxx file. It keeps 
getting
        rejected due to active ccbs. There realy should not be any active ccbs
        allowed here because the sync of the returned SC would only have started
        when there are no active ccbs and once sync is finished the imm should 
still
        not be persistent writable. THe only problem hewre is that empty CCBs 
are
        allowed to be created. Thus the condiftion for allowing the slave to
        generate its imm.db.xxxx file needs to be relaxed to allow empty CCBs.


Complete diffstat:
------------------
 osaf/libs/agents/saf/imma/imma_db.c              |     4 +-
 osaf/libs/agents/saf/imma/imma_oi_api.c          |     4 +-
 osaf/libs/agents/saf/imma/imma_proc.c            |    34 +-
 osaf/libs/common/immsv/immpbe_dump.cc            |   148 +++-
 osaf/libs/common/immsv/immsv_evt.c               |    55 +-
 osaf/libs/common/immsv/include/immpbe_dump.hh    |    10 +-
 osaf/libs/common/immsv/include/immsv_api.h       |    21 +-
 osaf/libs/common/immsv/include/immsv_evt.h       |    18 +-
 osaf/libs/common/immsv/include/immsv_evt_model.h |     4 +
 osaf/services/saf/immsv/immd/immd_amf.c          |     5 +-
 osaf/services/saf/immsv/immd/immd_cb.h           |     8 +-
 osaf/services/saf/immsv/immd/immd_db.c           |     5 +
 osaf/services/saf/immsv/immd/immd_evt.c          |    82 ++-
 osaf/services/saf/immsv/immd/immd_main.c         |    51 +-
 osaf/services/saf/immsv/immd/immd_mbcsv.c        |     2 +-
 osaf/services/saf/immsv/immd/immd_proc.c         |   241 ++++++-
 osaf/services/saf/immsv/immd/immd_proc.h         |     3 +-
 osaf/services/saf/immsv/immd/immd_sbevt.c        |    47 +-
 osaf/services/saf/immsv/immloadd/imm_loader.cc   |   323 +++++++-
 osaf/services/saf/immsv/immloadd/imm_loader.hh   |     8 +-
 osaf/services/saf/immsv/immloadd/imm_pbe_load.cc |   224 +++++-
 osaf/services/saf/immsv/immnd/ImmModel.cc        |   583 ++++++++++++---
 osaf/services/saf/immsv/immnd/ImmModel.hh        |    26 +-
 osaf/services/saf/immsv/immnd/ImmSearchOp.cc     |     5 -
 osaf/services/saf/immsv/immnd/immnd_cb.h         |    10 +-
 osaf/services/saf/immsv/immnd/immnd_evt.c        |   582 ++++++++++++++--
 osaf/services/saf/immsv/immnd/immnd_init.h       |    19 +-
 osaf/services/saf/immsv/immnd/immnd_main.c       |     3 +-
 osaf/services/saf/immsv/immnd/immnd_proc.c       |   319 +++++++-
 osaf/services/saf/immsv/immpbed/immpbe.cc        |    77 +-
 osaf/services/saf/immsv/immpbed/immpbe.hh        |     4 +
 osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |  2021 
++++++++++++++++++++++++++++++++++++++++++--------------
 32 files changed, 3969 insertions(+), 977 deletions(-)


Testing Commands:
-----------------
2PBE is enabled by commenting in the immd.conf environment variable:

     export IMMSV_2PBE_PEER_SC_MAX_WAIT=30



Testing, Expected Results:
--------------------------
2PBE should work, incrementally dumping all persistent data changes to
sqlite files at both sC-1 and SC-2.


Conditions of Submission:
-------------------------
Ack from Neel


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      n          n
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.



------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to