Hi Anders,
Following are the comments/questions for the given patch set:
1. check has to be there if OPENSAF_IMM_PBE_RT_CLASS_NAME is not found
@@ -942,6 +942,14 @@ int loadImmFromPbe(void* pbeHandle, bool
TRACE_2("Successfully Applied the CCB ");
saImmOmCcbFinalize(ccbHandle);
+ cimi =
classInfoMap.find(std::string(OPENSAF_IMM_PBE_RT_CLASS_NAME));
+ LOG_NO("Removed class %s RC:%u", (char *)
OPENSAF_IMM_PBE_RT_CLASS_NAME,
+ saImmOmClassDelete(immHandle, (char *)
OPENSAF_IMM_PBE_RT_CLASS_NAME));
+
+ /* if(cimi == classInfoMap.end()) {*/
+ opensafPbeRtClassCreate(immHandle);
+ /* }*/
+
2. creating of OsafImmPbeRt should not have been allowed in the pre-load
time.
osafimmloadd: ER Failed to create the class OsafImmPbeRt err:6
osafimmloadd: NO The class OsafImmPbeRt has been created since it was
missing from the imm.xml load file
3. when the pbe tries to update the epoch immeadietly after loading
following error is returned from the slave:
osafimmpbed: NO Got ERR_NOT_EXIST on atempt to update epoch towards
slave PBE
osafimmpbed: NO Update epoch 4 committing with ccbId:4294967299
osafimmnd[4773]: NO Implementer (applier) connected: 15 (@OpenSafImmPBE)
<0, 2010f>
osafimmnd[4773]: NO Implementer connected: 16 (OsafImmPbeRt_B) <0, 2010f>
4. The Adminoperation towards slave PBE for class create/delete and ccb
apply may effect the IMMSV synchronous timeout.
5. creating OPENSAF_IMM_PBE_RT_CLASS_NAME while loading if 2PBE is not
set must be avoided.
6. what is the use of creating OPENSAF_IMM_PBE_RT_IMPL_NAME_A
7. If the callback functions are same, what is the use of creating
OPENSAF_IMM_PBE_RT_IMPL_NAME_A and OPENSAF_IMM_PBE_RT_IMPL_NAME_B.
manipulating of present pbe callback functions may be considered. what
is the real use of these implementers.
8. when the same class created/deleted and PRTO create/update is
performed, the ccb ID is more in the slave PBE. once the cluster is
restarted always slave PBE will have greater ccbid which is chosen by
IMMD arbiteration.
9. while continuely creating ccb-creates, remove the imm.db(a case like
nfs failure/stop) then slave PBE tries to recreate the db but fails.
where In the primary PBE failed to aplly the PBE because slave PBE is
not there:
osafimmpbed: WA Failed to find CCB object for 18407
osafimmpbed: WA Start prepare for ccb: 18408 towards slave PBE returned:
'12' from Immsv
osafimmpbed: WA Ccb:18408 failed to prepare towards slave PBE
osafimmnd[30144]: NO Ccb 18408 ABORTED (immcfg_Slot-3_330)
osafimmpbed: WA Failed to find CCB object for 18408
osafimmpbed: WA Start prepare for ccb: 18409 towards slave PBE returned:
'12' from Immsv
osafimmpbed: WA Ccb:18409 failed to prepare towards slave PBE
osafimmnd[30144]: NO Ccb 18409 ABORTED (immcfg_Slot-3_333)
osafimmpbed: WA Failed to find CCB object for 18409
The slave PBE can not be able to do classimplementerset
Oct 4 15:05:58 Slot-4 osafimmnd[3039]: NO ERR_TRY_AGAIN: ccb 9657 is
active on object cscfRdn=75387 of class neNumber. Can not add class applier
Oct 4 15:06:04 Slot-4 last message repeated 10 times
Oct 4 15:06:04 Slot-4 osafimmpbed: ER saImmOiClassImplementerSet for
neNumber failed 6
Oct 4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally
disconnected. Marking it as doomed 131 <429, 2020f> (@OpenSafImmPBE)
Oct 4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally
disconnected. Marking it as doomed 132 <430, 2020f> (OsafImmPbeRt_B)
Oct 4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 131
<429, 2020f> (@OpenSafImmPBE)
Oct 4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 132
<430, 2020f> (OsafImmPbeRt_B)
Oct 4 15:06:04 Slot-4 osafimmnd[3039]: WA SLAVE PBE process has
apparently died at non coord
Oct 4 15:06:04 Slot-4 osafimmnd[3039]: NO STARTING SLAVE PBE process.
10. In the master-slave of 2PBE approach, another level of time
dependecy is created(by calling admin operations). using atomic approach
for both PBE's may be faster because, PBEB (@OpenSafImmPBE) receives the
callbacks same as master(PBEA).
Thanks,
Neel.
On Friday 19 July 2013 03:25 PM, Anders Bjornerstedt wrote:
> Summary: Pre-review of 2PBE complete dataflow.
> Review request for Trac Ticket(s): (#21)
> Peer Reviewer(s): Neel
> Pull request to:
> Affected branch(es): devel(4.4)
> Development branch:
>
> --------------------------------
> Impacted area Impact y/n
> --------------------------------
> Docs n
> Build system n
> RPM/packaging n
> Configuration files n
> Startup scripts n
> SAF services n
> OpenSAF services n
> Core libraries n
> Samples n
> Tests n
> Other n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
>
> Thisd is a pre-review of the basic data flow solution that I propose for 2PBE.
> By pre-review I mean:
> These specific patches will not be pushed.
> Thus I am not waiting for an "ack" on these patches.
> The intention is to communicate the development of 2PBE so far and to allow
> anyone to test and experimentat with it, or just inspect the code to get
> an understanding of how it will work.
>
> This is not a complete solution for 2PBE.
> There is still work to be done to get tighter syncronisation in the commit
> of an imm-transaction between the two PBEs. With regular single PBE, the
> commit
> of one imm-transaction was atomic with the commit with the commit to the
> sqlite file.
> ion fact the commit of the sqlite transaction *was* the commit of the
> imm-transaction.
>
> With 2PBE, the commit of the transaction to PBE is still the commit of the
> imm-transaction.
> But now there is also the issue of if and how to get the two sqlite files to
> commit atomically.
> My current stance on this is that I will not attempt to make the commit to
> the two sqlite files
> 100% atomic. This would either require at least two sqlite transaction
> commits/writes for every
> one imm-transaction, or require the utilization of some open 2pc interface in
> sqlite if that
> is available. I believe there is some kind of callback hook availble for
> sqlite transaction
> prepare. But I will not attempt to use that for now. The basic idea is
> instead to keep
> one primary PBE and then add a slave PBE. To make each imm-transaction commit
> with
> syncronization between the two PBEs before sqlite commit, minimizing the
> probability that
> one sqlite instance succeeds in commiting while the other sqlite instance
> fails.
> Still they may of course diverge, due to file system problems or crashes.
> The first thing to note here is that a cluster start, after such a broken PBE
> commit,
> will be handled by the loading arbitration and that arbitration will choose
> to load
> from the sqlite file that succeeded in the commit (the latest file) as long
> as it
> is available. And even if the cluster restart should come up with one SC with
> the
> file that failed to commit the last ccb and timeout in waiting for the other
> and
> thus load from the older file, then there is also no problem because this
> only means
> that the transaction will have been aborted. Note that the user/client of
> the transaction
> will not have obtained any answer on the outcome unless both PBEs succeeded
> in their commit.
>
> For regular CCBs, the primary PBE will prepare by executing all the sqlite
> calls needed to
> build the transaction, but before comitting it sends a a message to the slave
> PBE asking
> it it has received all requests to be part of the ccb and if so to also
> execute all
> the sqlite calls to prepare the ccb. When/if the standby replies with ok,
> then the
> primary PBE will commit its sqlite transaction and reply to immsv. The immsv
> will then
> commit the ccb in imm-cluster-ram. Finaly as part of sending messages about
> the commit
> of the ccb to all appliers, the slave PBE will get both completed and apply
> and will
> (hopefully) commit its sqlite transaction.
>
> In general, the slave PBE is tightly controlled by the primary PBE (an
> asymetric solution).
> For class management and PRTO/PRTA handling the solution is slightly
> different.
> In general I have also tried to leverage from existing mechanisms, such as
> making the
> slave PBE an applier of all config classes, thus avoiding additional and
> unnecessary
> distributed messaging for the payload of a ccb.
>
> changeset 131d635a526a0e894245ecb08b630268f9035976
> Author: Anders Bjornerstedt <[email protected]>
> Date: Fri, 19 Jul 2013 08:39:25 +0200
>
> IMM: 2PBE loading (test patch-1) [#21]
>
> This is a testpatch containing a test version of the 2PBE loading
> mechanism.
> This patch will not be pushed. The intent with this patch is to allow
> testing and obtaining feedback on the 2PBE loading.
>
> This firt patch has aready been sent out once earlier for pre-review.
> Keep in mind that it is for cluster restarts that all this is intended.
> So in one sense this is the most important patch to be as reliable as
> possible.
> --------------------------------------------------------------
>
> changeset f240bedd6aa2b92bdea38d1c6532ac8097ac9444
> Author: Anders Bjornerstedt <[email protected]>
> Date: Fri, 19 Jul 2013 08:45:07 +0200
>
> IMM: 2PBE ccb-handling (test patch-2) [#21]
>
> This is a testpatch containing a test version of the 2PBE ccb-handling.
> This
> patch will not be pushed. The intent with this patch is to allow
> testing and
> obtaining feedback on the 2PBE ccb-handling. This test-patch goes on
> top of
> the "2PBE loading" testpatch.
>
> Contains the process management changes to start and restart 2 PBEs,
> including failover
> handling. Plus the data flow solution for CCBs. THe detailed synhronisation
> of ccb commit
> between the two PBEs is still missing.
> -------------------------------------------------------------
>
> changeset 421acf69d0765d0860bfe726b8f1ba1ccb519e6b
> Author: Anders Bjornerstedt <[email protected]>
> Date: Fri, 19 Jul 2013 08:55:23 +0200
>
> IMM: 2PBE class-create/delete/schema change handling (test patch-3)
> [#21]
>
> This is a testpatch containing a test version of the 2PBE
> class-handling.
> This patch will not be pushed. The intent with this patch is to allow
> testing and obtaining feedback on the 2PBE handling of imm classes. This
> test-patch goes on top of the "2PBE ccb-handling" testpatch.
>
> Provides 2PBE persistification of class create/delete/schema-change.
> Extends the existing PBE oultion that uses an admin-op from immsv to PBE
> so that the primary PBE invokes the same admin-op towards slave.
>
> ---------------------------------------------------------------
>
> changeset d6f91100b51e2b7215bf5fab95e10d0c069d14db
> Author: Anders Bjornerstedt <[email protected]>
> Date: Fri, 19 Jul 2013 09:10:43 +0200
>
> IMM: 2PBE PRTO-create handling (test patch-4) [#21]
>
> This is a testpatch containing a test version of the 2PBE handling of
> creates of persistent runtime objects (PRTOs). This patch will not be
> pushed. The intent with this patch is to allow testing and obtaining
> feedback on the 2PBE handling of PRTO creates This test-patch goes on
> top of
> the "2PBE class-handling" testpatch.
>
> Provides 2PBE persistification of PRTO create.
> In this case the immsv directly sends the same payload callbacks to both the
> primary
> and slave PBE. The primary will invoke an admin-op towards the slave to
> syncronize
> (not implemented).
>
> changeset 416b82d2d116c2a9ede63b507c72c10ee2126f42
> Author: Anders Bjornerstedt <[email protected]>
> Date: Fri, 19 Jul 2013 09:16:17 +0200
>
> IMM: 2PBE PRTO-delete handling (test patch-5) [#21]
>
> This is a testpatch containing a test version of the 2PBE handling of
> deletes of persistent runtime objects (PRTOs). This patch will not be
> pushed. The intent with this patch is to allow testing and obtaining
> feedback on the 2PBE handling of PRTO deletes This test-patch goes on
> top of
> the "2PBE PRTO-create" testpatch.
>
> Provides 2PBE persistification of PRTO delete. This operation is quite
> different
> from PRTO create because PRTO delete can in general be the delete of a
> subtree,
> i.e. several PRTOs that have to be deleted as one transaction. Again the immsv
> sends the same messages that it sent to the PBE, now also to the slave PBE.
>
> changeset 894df553ef7e1b829db43329a5bafe4ce7fb2cbd
> Author: Anders Bjornerstedt <[email protected]>
> Date: Fri, 19 Jul 2013 09:20:45 +0200
>
> IMM: 2PBE PRTA-update handling (test patch-6) [#21]
>
> This is a testpatch containing a test version of the 2PBE handling of
> updates of persistent runtime attributes (PRTAs). PRTAs can exist in
> either
> PRTOs or config objects. This patch will not be pushed. The intent with
> this
> patch is to allow testing and obtaining feedback on the 2PBE handling of
> PRTA updates This test-patch goes on top of the "2PBE PRTO-delete"
>
> Provides 2PBE persistification of PRTA updates. This more similar to PRTO
> create
> than it is PRTO delete, but PRTA updaes can be an update in a config object.
>
>
>
> Complete diffstat:
> ------------------
> osaf/libs/agents/saf/imma/imma_oi_api.c | 4 +-
> osaf/libs/agents/saf/imma/imma_proc.c | 34 +++-
> osaf/libs/common/immsv/immpbe_dump.cc | 62 ++++++-
> osaf/libs/common/immsv/immsv_evt.c | 55 ++++++-
> osaf/libs/common/immsv/include/immpbe_dump.hh | 6 +-
> osaf/libs/common/immsv/include/immsv_api.h | 17 +-
> osaf/libs/common/immsv/include/immsv_evt.h | 18 +-
> osaf/libs/common/immsv/include/immsv_evt_model.h | 4 +
> osaf/services/saf/immsv/immd/immd_amf.c | 5 +-
> osaf/services/saf/immsv/immd/immd_cb.h | 6 +-
> osaf/services/saf/immsv/immd/immd_db.c | 2 +
> osaf/services/saf/immsv/immd/immd_evt.c | 82 ++++++++++-
> osaf/services/saf/immsv/immd/immd_main.c | 40 +++++-
> osaf/services/saf/immsv/immd/immd_proc.c | 229
> +++++++++++++++++++++++++++++-
> osaf/services/saf/immsv/immd/immd_proc.h | 3 +-
> osaf/services/saf/immsv/immd/immd_sbevt.c | 23 ++-
> osaf/services/saf/immsv/immloadd/imm_loader.cc | 315
> ++++++++++++++++++++++++++++++++++++------
> osaf/services/saf/immsv/immloadd/imm_loader.hh | 8 +-
> osaf/services/saf/immsv/immloadd/imm_pbe_load.cc | 196
> ++++++++++++++++++++++++-
> osaf/services/saf/immsv/immnd/ImmModel.cc | 333
> ++++++++++++++++++++++++++++++++++---------
> osaf/services/saf/immsv/immnd/ImmModel.hh | 17 +-
> osaf/services/saf/immsv/immnd/ImmSearchOp.cc | 5 -
> osaf/services/saf/immsv/immnd/immnd_cb.h | 6 +-
> osaf/services/saf/immsv/immnd/immnd_evt.c | 345
> +++++++++++++++++++++++++++++++++++++++++++---
> osaf/services/saf/immsv/immnd/immnd_init.h | 8 +-
> osaf/services/saf/immsv/immnd/immnd_main.c | 3 +-
> osaf/services/saf/immsv/immnd/immnd_proc.c | 246
> +++++++++++++++++++++++++++------
> osaf/services/saf/immsv/immpbed/immpbe.cc | 77 +++++----
> osaf/services/saf/immsv/immpbed/immpbe.hh | 4 +
> osaf/services/saf/immsv/immpbed/immpbe_daemon.cc | 677
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
> tests/immsv/implementer/applier.c | 121 ++++++++++++---
> 31 files changed, 2559 insertions(+), 392 deletions(-)
>
>
> Testing Commands:
> -----------------
>
>
> Testing, Expected Results:
> --------------------------
> The basic normal positive use cases should work,
> including the persistification of data to *both* pbe files.
>
> Most negative test cases (failover, killing of processes) should also.
>
> The main risk to watch for is when/if a PBE restarts with '--recover'
> which means it simply re-attaches to the sqlite file without regenerating
> a fresh file dumped from imm-ram, then there is a risk that there could
> have been introduced a discrepancy between the files. In particular, the
> last transaction being processed at the time of the crash could have
> commited at the other PBE, yet have been rolled back at the restarted PBE.
>
> Persistent writes are blocked as long as not both PBEs are available.
>
> Conditions of Submission:
> -------------------------
> These patches will not be pushed.
>
>
> Arch Built Started Linux distro
> -------------------------------------------
> mips n n
> mips64 n n
> x86 n n
> x86_64 n n
> powerpc n n
> powerpc64 n n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
> that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
> (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
> Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
> like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
> cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
> too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
> Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
> commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
> of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
> comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
> the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
> for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
> do not contain the patch that updates the Doxygen manual.
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel