Ok, looking into it. Thanks /AndersBj
________________________________ From: Neelakanta Reddy [mailto:[email protected]] Sent: den 8 november 2013 11:23 To: Anders Björnerstedt Cc: [email protected] Subject: Re: [PATCH 0 of 6] Review Request for 2PBE - updated patch stack Hi Andersbj, while testing I am not able to proceed: case 1: 2PBE is not configured and PBE is also not configured. getting the following error, while starting the opensaf Nov 8 15:38:59 Slot-3 kernel: TIPC: Own node address <1.1.1>, network identity 1234 Nov 8 15:38:59 Slot-3 kernel: TIPC: Enabled bearer <eth:eth1>, discovery domain <1.1.0>, priority 10 Nov 8 15:38:59 Slot-3 osafrded[4713]: Started Nov 8 15:39:01 Slot-3 osafrded[4713]: NO Peer not available => Active role Nov 8 15:39:01 Slot-3 osaffmd[4727]: Started Nov 8 15:39:01 Slot-3 osafimmd[4742]: Started Nov 8 15:39:01 Slot-3 osafimmnd[4757]: Started Nov 8 15:39:01 Slot-3 osafimmd[4742]: NO New IMMND process is on ACTIVE Controller at 2010f Nov 8 15:39:01 Slot-3 osafimmd[4742]: NO First SC IMMND (OpenSAF 4.4 or later) attached 2010f Nov 8 15:39:01 Slot-3 osafimmd[4742]: NO First IMMND at SC to attach is NOT configured for PBE Nov 8 15:39:01 Slot-3 osafimmd[4742]: NO First IMMND on SC found at 2010f this IMMD at 2010f. Cluster is loading, *not* 2PBE => designating that IMMND as coordinator Nov 8 15:39:01 Slot-3 osafimmnd[4757]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Nov 8 15:39:01 Slot-3 osafimmnd[4757]: NO This IMMND is now the NEW Coord Nov 8 15:39:04 Slot-3 osafimmnd[4757]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Nov 8 15:39:04 Slot-3 osafimmd[4742]: NO Successfully announced loading. New ruling epoch:1 Nov 8 15:39:04 Slot-3 osafimmnd[4757]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_SERVER Nov 8 15:39:04 Slot-3 osafimmnd[4757]: NO NODE STATE-> IMM_NODE_LOADING Nov 8 15:39:04 Slot-3 osafimmnd[4757]: ER LOADING APPARENTLY FAILED status:0 Nov 8 15:39:04 Slot-3 osafimmd[4742]: ER ******** LOADING FAILED. File(s) possibly missing, inaccessible or corrupt .. ? ********* Nov 8 15:39:04 Slot-3 opensafd[4681]: ER Failed DESC:IMMND Nov 8 15:39:04 Slot-3 opensafd[4681]: ER Going for recovery Nov 8 15:39:04 Slot-3 opensafd[4681]: ER Trying To RESPAWN /usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1 Nov 8 15:39:04 Slot-3 opensafd[4681]: ER Sending SIGKILL to IMMND, pid=4749 Nov 8 15:39:04 Slot-3 osafimmnd[4757]: ER IMMND - Periodic server job failed case 2: 2PBE is configured and PBE is configured and no imm.db is present: Nov 8 15:42:10 Slot-4 kernel: TIPC: Enabled bearer <eth:eth1>, discovery domain <1.1.0>, priority 10 Nov 8 15:42:10 Slot-4 osafrded[3983]: Started Nov 8 15:42:12 Slot-4 osafrded[3983]: NO Peer not available => Active role Nov 8 15:42:12 Slot-4 osaffmd[3997]: Started Nov 8 15:42:12 Slot-4 osafimmd[4012]: Started Nov 8 15:42:12 Slot-4 osafimmd[4012]: NO 2PBE configured with IMMSV_PEER_SC_MAX_WAIT: 30 seconds Nov 8 15:42:12 Slot-4 osafimmnd[4027]: Started Nov 8 15:42:12 Slot-4 osafimmnd[4027]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Nov 8 15:42:12 Slot-4 osafimmd[4012]: NO New IMMND process is on ACTIVE Controller at 2020f Nov 8 15:42:12 Slot-4 osafimmd[4012]: NO Extended intro from node 2020f Nov 8 15:42:12 Slot-4 osafimmd[4012]: NO First SC IMMND (OpenSAF 4.4 or later) attached 2020f Nov 8 15:42:12 Slot-4 osafimmd[4012]: NO IMMND on SC found at 2020f this IMMD at 2020f. Cluster is loading. 2PBE configured => Wait. Nov 8 15:42:12 Slot-4 osafimmnd[4027]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Nov 8 15:42:12 Slot-4 osafimmnd[4027]: NO 2PBE startup arbitration initiated from IMMD Nov 8 15:42:12 Slot-4 osafimmnd[4027]: NO 2PBE configured, IMMSV_PBE_FILE_SUFFIX:.2020f (preload) Nov 8 15:42:12 Slot-4 osafimmloadd: logtrace: trace enabled to file /var/log/opensaf/osafimmnd, mask=0xffffffff Nov 8 15:42:12 Slot-4 osafimmloadd: NO 2PBE pre-load starting Nov 8 15:42:12 Slot-4 osafimmloadd: NO IMMSV_PBE_FILE is defined (imm.db.2020f) check it for existence and SaImmRepositoryInitModeT Nov 8 15:42:12 Slot-4 osafimmloadd: IN File '/etc/opensaf/imm.db.2020f' is not accessible for read/write, cause:No such file or directory Nov 8 15:42:12 Slot-4 osafimmloadd: WA Could not open repository:imm.db.2020f Nov 8 15:42:12 Slot-4 osafimmloadd: NO Trying without suffix Nov 8 15:42:12 Slot-4 osafimmloadd: IN File '/etc/opensaf/imm.db' is not accessible for read/write, cause:No such file or directory Nov 8 15:42:12 Slot-4 osafimmloadd: NO 2PBE: Pre-loading from XML file imm.xml at /etc/opensaf Nov 8 15:42:13 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:1089 new timeout: 28911 msecs Nov 8 15:42:14 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:2106 new timeout: 27894 msecs Nov 8 15:42:15 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:3127 new timeout: 26873 msecs Nov 8 15:42:16 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:4144 new timeout: 25856 msecs Nov 8 15:42:17 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:5161 new timeout: 24839 msecs Nov 8 15:42:18 Slot-4 osafimmd[4012]: NO 2PBE wait. Passed time:6175 new timeout: 23825 msecs Nov 8 15:42:19 Slot-4 osafimmloadd: WA Failed to create the class OsafImmPbeRt err:6 Nov 8 15:42:19 Slot-4 osafimmloadd: ER Failed to create the class OsafImmPbeRt - exiting Nov 8 15:42:19 Slot-4 osafimmnd[4027]: ER Prealoader failed to obtain stats. Nov 8 15:42:19 Slot-4 opensafd[3951]: ER Failed DESC:IMMND Nov 8 15:42:19 Slot-4 opensafd[3951]: ER Going for recovery Nov 8 15:42:19 Slot-4 opensafd[3951]: ER Trying To RESPAWN /usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1 Nov 8 15:42:19 Slot-4 opensafd[3951]: ER Sending SIGKILL to IMMND, pid=4018 /Neel. On Tuesday 05 November 2013 09:50 PM, Anders Bjornerstedt wrote: Summary: IMM: 2PBE new version of the patch stack [#21] Review request for Trac Ticket(s): 21 Peer Reviewer(s): Neel Pull request to: Affected branch(es): default (4.4) Development branch: -------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- The patch stack for the 2PBE enhancement has been updated. This is just a new review erquest for enhancement #21. The first patch (loading) has been adjusted to apply cleanly on top of changeset: changeset: 4588:393ca121ca7c user: Anders Bjornerstedt <[email protected]><mailto:[email protected]> date: Mon Nov 04 18:29:25 2013 +0100 summary: IMM: IMMD file verification made upgrade safe [#596] The second and third patches are unchanged. Three additional fix patches follow on top of that. changeset 0c554fd3174b67eac5c599af6d6d2cc97b126e51 Author: Anders Bjornerstedt <[email protected]><mailto:[email protected]> Date: Mon, 28 Oct 2013 10:25:27 +0100 IMM: 2PBE patch-1 (loading) [#21] This patch contains the 2PBE loading mechanism, needed to support 2PBE. The IMMD's will detect 2PBE loading by the IMMSV_2PBE_PEER_SC_MAX_WAIT environment variable being set in the immd.conf file. The active IMMD will order each SC IMMND to execute a "preload" probing the SC local filesystem for the file state that would be loaded to the cluster if that IMMND was chosen as coord. The IMMND sends these stats to the active IMMD. The IMMD will wait for the IMMNDs at *both* SCs to complete this task and then determine which SC has the apparently latest file state. The IMMND at that SC will then be chosen as IMMND coord. Actual loading then proceeds in the same way as for regular 1PBE. The IMMSV_2PBE_PEER_SC_MAX_WAIT is by default 30 seconds. This value should be high enough to make it very unlikely that the active IMMD is forced choose loader when only a single SC IMMND has joined. If that happens, then the risk is that the cluster restart will be done *not* using the latest persistent imm state, effectively rewinding the imm state. (Note the same thing willl happen with regular 1PBE based on a shared filesystem (DRBD) if the one SC fails to come up in time to join the DRBD sync protocol. The corresponding DRBD timeout is on the order of 20 seconds. When loading has completed, additional 2PBE functiolaity will start two PBEs, one at each SC. That functionality is delivered in subsequent patches. changeset 0f3cc59f1eb8031034ac41485cd75438ec17c4b1 Author: Anders Bjornerstedt <[email protected]><mailto:[email protected]> Date: Fri, 11 Oct 2013 01:17:44 +0200 IMM: 2PBE patch-2 (dumping) [#21] This patch contains the 2PBE dumping mechanism, needed to support 2PBE. A PBE process is started by the IMMND at each SC, not just the IMMND coord. The PBE colocated with the IMMND coord, called the primary PBE, is still the coordinator for transaction commits (for CCBs and PRT operations and class- create/deletes). The primary PBE (sometimes called the A-side PBE) works very much in the same way as the regular single PBE does in 1-PBE. The PBE colocated with the SC resident but non-coord IMMND is called the slave PBE (sometimes called the B-side PBE). With 2PBE, *both* PBEs must be available for the imm to be persistent- writable. If one or boths PBEs are unavailable (or unresponsive) then persistent writes (ccbs and PRT operations) will fail. In 2PBE, a restarted PBE will more often need to regenerate the sqlite file. On the other hand, regeneration of the sqlite file should be faster in 2PBE than in regular 1PBE because the file is typically placed on a local file system. A subsequent patch will provide a mechhanism for allowing 1-safe-2PBE. This will allow the imm to open up for persistent writes when only one of the two PBEs are available. THis will only be allowed when and during the absence of an SC. As soon as the other SC rejoins the IMM has to re enter the non- persistent-writable state. changeset d99a312f527c8fb701149ff840dce1bffe416d75 Author: Anders Bjornerstedt <[email protected]><mailto:[email protected]> Date: Fri, 11 Oct 2013 02:08:51 +0200 IMM: 2PBE patch-3 (1safe2pbe) [#21] This patch contains the 2PBE 1safe2Pbe mechanism. This mechanism allows an OpenSAF cluster to open up for persistent writes using only one of the two PBEs - temporarily. This is only intended to be used as an emergency action when one SC is long term unavailable, e.g. hardware problems. As soon as the other SC returns, the IMM has to re-enter normal 2-safe 2PBE and reject persistent writes until the slave PBE has synced (regenerated its sqlite file) and rejoined the cluster. The 1safe2PBE state is entered by the administrative opeation: immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:8 \ opensafImm=opensafImm,safApp=safImmService It is exited either automatically by a rejoined SC or by an explicit administrative opertion: immadm -o 2 -p opensafImmNostdFlags:SA_UINT32_T:8 \ opensafImm=opensafImm,safApp=safImmService changeset d4b720966e5e0769130163a21c5e9642d9e14864 Author: Anders Bjornerstedt <[email protected]><mailto:[email protected]> Date: Tue, 05 Nov 2013 16:41:05 +0100 IMM: 2PBE patch-4 (Fix for si-swap problem) [#21] With the 2PBE patches applied and 2PBE configured, if si_swap: immadm -o 7 safSi=SC-2N,safApp=OpenSAF is attempted twice, then the second time will cause the new active SC to reboot. This was caused by the cb->is_loading variable being initialized to true at both active and standby SC, whwn it should only have been set to true in the active. When true also in the standby it is not set to false by loading completed. When the standby becomes active, it will not mbcp fevs messages to new standby, despite that loading was done a long time ago. Next si swap causes the new new active to start sending fevs messages way below the expected fevs number. This causes the new new standby to crash. changeset ec4d0f6e1b90ab7dae6d2ab923aafe03f90c3f71 Author: Anders Bjornerstedt <[email protected]><mailto:[email protected]> Date: Tue, 05 Nov 2013 16:52:49 +0100 IMM: 2PBE patch-5 (Fix loading retry problem) [#21] If after loadin-arbitration is done, loading is started but fails (for example due to a corrupt sqlite file) then the IMMNDs are restarted but not IMMDs. The already collected loading arbitration info in the active IMMD is not cleared and in the next loading attempt the loading arbitration will only wait for the stats from one of the SCs, then find it already has stats from both SC's but actually one of them will be stats from the previous load (!). This can result in incorrect arbitration. The file for which loading fails will have been moved to imm.db.xxxxx.failed and thus not used for arbitration. The fallback file is typically much older, but that may be masked by the old preload stats for that SC. The preload stats need to be cleared here when and if loading is restarted. changeset 03064f5a3812a865e1c59abf59e1b51ace9d397e Author: Anders Bjornerstedt <[email protected]><mailto:[email protected]> Date: Tue, 05 Nov 2013 17:00:20 +0100 IMM: 2PBE patch-6 (PBE slave can re-attach when empty CCBs exist) [#21] When a 2PE system is up and running with ccbs being generated, if one SC is rebooted, then after the SC has synced imm-ram, the slave pbe typically has trouble in being allowed to generate its imm.db.xxxxx file. It keeps getting rejected due to active ccbs. There realy should not be any active ccbs allowed here because the sync of the returned SC would only have started when there are no active ccbs and once sync is finished the imm should still not be persistent writable. THe only problem hewre is that empty CCBs are allowed to be created. Thus the condiftion for allowing the slave to generate its imm.db.xxxx file needs to be relaxed to allow empty CCBs. Complete diffstat: ------------------ osaf/libs/agents/saf/imma/imma_db.c | 4 +- osaf/libs/agents/saf/imma/imma_oi_api.c | 4 +- osaf/libs/agents/saf/imma/imma_proc.c | 34 +- osaf/libs/common/immsv/immpbe_dump.cc | 148 +++- osaf/libs/common/immsv/immsv_evt.c | 55 +- osaf/libs/common/immsv/include/immpbe_dump.hh | 10 +- osaf/libs/common/immsv/include/immsv_api.h | 21 +- osaf/libs/common/immsv/include/immsv_evt.h | 18 +- osaf/libs/common/immsv/include/immsv_evt_model.h | 4 + osaf/services/saf/immsv/immd/immd_amf.c | 5 +- osaf/services/saf/immsv/immd/immd_cb.h | 8 +- osaf/services/saf/immsv/immd/immd_db.c | 5 + osaf/services/saf/immsv/immd/immd_evt.c | 82 ++- osaf/services/saf/immsv/immd/immd_main.c | 51 +- osaf/services/saf/immsv/immd/immd_mbcsv.c | 2 +- osaf/services/saf/immsv/immd/immd_proc.c | 241 ++++++- osaf/services/saf/immsv/immd/immd_proc.h | 3 +- osaf/services/saf/immsv/immd/immd_sbevt.c | 47 +- osaf/services/saf/immsv/immloadd/imm_loader.cc | 323 +++++++- osaf/services/saf/immsv/immloadd/imm_loader.hh | 8 +- osaf/services/saf/immsv/immloadd/imm_pbe_load.cc | 224 +++++- osaf/services/saf/immsv/immnd/ImmModel.cc | 583 ++++++++++++--- osaf/services/saf/immsv/immnd/ImmModel.hh | 26 +- osaf/services/saf/immsv/immnd/ImmSearchOp.cc | 5 - osaf/services/saf/immsv/immnd/immnd_cb.h | 10 +- osaf/services/saf/immsv/immnd/immnd_evt.c | 582 ++++++++++++++-- osaf/services/saf/immsv/immnd/immnd_init.h | 19 +- osaf/services/saf/immsv/immnd/immnd_main.c | 3 +- osaf/services/saf/immsv/immnd/immnd_proc.c | 319 +++++++- osaf/services/saf/immsv/immpbed/immpbe.cc | 77 +- osaf/services/saf/immsv/immpbed/immpbe.hh | 4 + osaf/services/saf/immsv/immpbed/immpbe_daemon.cc | 2021 ++++++++++++++++++++++++++++++++++++++++++-------------- 32 files changed, 3969 insertions(+), 977 deletions(-) Testing Commands: ----------------- 2PBE is enabled by commenting in the immd.conf environment variable: export IMMSV_2PBE_PEER_SC_MAX_WAIT=30 Testing, Expected Results: -------------------------- 2PBE should work, incrementally dumping all persistent data changes to sqlite files at both sC-1 and SC-2. Conditions of Submission: ------------------------- Ack from Neel Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
