Ok thanks will look into your comments.
That review request was a "pre-review request" sent in July, mostly to 
give you and any other reviewers or interested
parties time to understand roughly where I was going with this.

Improvements and tightening has been done on the 2PBE implementation 
since then.
I will be sending out a new and the real formal review request for #21 
early next week.
Some cleanup and documentation work remains.
This is an enhancement and quite a large one at that, so more time than 
usual will be given for  review.

/AndersBj

Neelakanta Reddy wrote:
> Hi Anders,
>
> Following are the comments/questions for the given patch set:
>
> 1. check has to be there if  OPENSAF_IMM_PBE_RT_CLASS_NAME is not found
>
> @@ -942,6 +942,14 @@ int loadImmFromPbe(void* pbeHandle, bool
>         TRACE_2("Successfully Applied the CCB ");
>         saImmOmCcbFinalize(ccbHandle);
>
> +       cimi = 
> classInfoMap.find(std::string(OPENSAF_IMM_PBE_RT_CLASS_NAME));
> +       LOG_NO("Removed class %s RC:%u", (char *) 
> OPENSAF_IMM_PBE_RT_CLASS_NAME,
> +               saImmOmClassDelete(immHandle, (char *) 
> OPENSAF_IMM_PBE_RT_CLASS_NAME));
> +
> +       /*      if(cimi == classInfoMap.end()) {*/
> +       opensafPbeRtClassCreate(immHandle);
> +       /*      }*/
> +
> 2. creating of OsafImmPbeRt should not have been allowed in the 
> pre-load time.
>
> osafimmloadd: ER Failed to create the class OsafImmPbeRt err:6
> osafimmloadd: NO The class OsafImmPbeRt has been created since it was 
> missing from the imm.xml load file
>
>
> 3. when the pbe tries to update the epoch immeadietly after loading 
> following error is returned from the slave:
>
> osafimmpbed: NO Got ERR_NOT_EXIST on atempt to update epoch towards 
> slave PBE
> osafimmpbed: NO Update epoch 4 committing with ccbId:4294967299
> osafimmnd[4773]: NO Implementer (applier) connected: 15 
> (@OpenSafImmPBE) <0, 2010f>
> osafimmnd[4773]: NO Implementer connected: 16 (OsafImmPbeRt_B) <0, 2010f>
>
> 4. The Adminoperation towards slave PBE for class create/delete and 
> ccb apply may effect the IMMSV synchronous timeout.
>
> 5. creating OPENSAF_IMM_PBE_RT_CLASS_NAME while loading if 2PBE is not 
> set must be avoided.
>
> 6. what is the use of creating OPENSAF_IMM_PBE_RT_IMPL_NAME_A
>
> 7. If the callback functions are same, what is the use of creating 
> OPENSAF_IMM_PBE_RT_IMPL_NAME_A and OPENSAF_IMM_PBE_RT_IMPL_NAME_B.
>
> manipulating of present pbe callback functions may be considered. what 
> is the real use of these implementers.
>
> 8. when the same class created/deleted and PRTO create/update is 
> performed, the ccb ID is more in the slave PBE. once the cluster is 
> restarted always slave PBE will have greater ccbid which is chosen by 
> IMMD arbiteration.
>
> 9. while continuely creating ccb-creates, remove the imm.db(a case 
> like nfs failure/stop) then slave PBE tries to recreate the db but fails.
>
> where In the primary PBE failed to aplly the PBE because slave PBE is 
> not there:
>
> osafimmpbed: WA Failed to find CCB object for 18407
> osafimmpbed: WA Start prepare for ccb: 18408 towards slave PBE 
> returned: '12' from Immsv
> osafimmpbed: WA Ccb:18408 failed to prepare towards slave PBE
> osafimmnd[30144]: NO Ccb 18408 ABORTED (immcfg_Slot-3_330)
> osafimmpbed: WA Failed to find CCB object for 18408
> osafimmpbed: WA Start prepare for ccb: 18409 towards slave PBE 
> returned: '12' from Immsv
> osafimmpbed: WA Ccb:18409 failed to prepare towards slave PBE
> osafimmnd[30144]: NO Ccb 18409 ABORTED (immcfg_Slot-3_333)
> osafimmpbed: WA Failed to find CCB object for 18409
>
>
> The slave PBE can not be able to do classimplementerset
>
> Oct  4 15:05:58 Slot-4 osafimmnd[3039]: NO ERR_TRY_AGAIN: ccb 9657 is 
> active on object cscfRdn=75387 of class neNumber. Can not add class 
> applier
> Oct  4 15:06:04 Slot-4 last message repeated 10 times
> Oct  4 15:06:04 Slot-4 osafimmpbed: ER saImmOiClassImplementerSet for 
> neNumber failed 6
> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally 
> disconnected. Marking it as doomed 131 <429, 2020f> (@OpenSafImmPBE)
> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally 
> disconnected. Marking it as doomed 132 <430, 2020f> (OsafImmPbeRt_B)
> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 
> 131 <429, 2020f> (@OpenSafImmPBE)
> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 
> 132 <430, 2020f> (OsafImmPbeRt_B)
> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: WA SLAVE PBE process has 
> apparently died at non coord
> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO STARTING SLAVE PBE process.
>
>
> 10. In the master-slave of 2PBE approach, another level of time 
> dependecy is created(by calling admin operations). using atomic 
> approach for both PBE's may be faster because, PBEB (@OpenSafImmPBE) 
> receives the callbacks same as master(PBEA).
>
> Thanks,
> Neel.
>
> On Friday 19 July 2013 03:25 PM, Anders Bjornerstedt wrote:
>> Summary: Pre-review of 2PBE complete dataflow.
>> Review request for Trac Ticket(s): (#21)
>> Peer Reviewer(s): Neel
>> Pull request to:
>> Affected branch(es): devel(4.4)
>> Development branch:
>>
>> --------------------------------
>> Impacted area       Impact y/n
>> --------------------------------
>>   Docs                    n
>>   Build system            n
>>   RPM/packaging           n
>>   Configuration files     n
>>   Startup scripts         n
>>   SAF services            n
>>   OpenSAF services        n
>>   Core libraries          n
>>   Samples                 n
>>   Tests                   n
>>   Other                   n
>>
>>
>> Comments (indicate scope for each "y" above):
>> ---------------------------------------------
>>
>> Thisd is a pre-review of the basic data flow solution that I propose 
>> for 2PBE.
>> By pre-review I mean:
>> These specific patches will not be pushed.
>> Thus I am not waiting for an "ack" on these patches.
>> The intention is to communicate the development of 2PBE so far and to 
>> allow
>> anyone to test and experimentat with it, or just inspect the code to get
>> an understanding of how it will work.
>>
>> This is not a complete solution for 2PBE.
>> There is still work to be done to get tighter syncronisation in the 
>> commit
>> of an imm-transaction between the two PBEs. With regular single PBE, 
>> the commit
>> of one imm-transaction was atomic with the commit with the commit to 
>> the sqlite file.
>> ion fact the commit of the sqlite transaction *was* the commit of the 
>> imm-transaction.
>>
>> With 2PBE, the commit of the transaction to PBE is still the commit 
>> of the imm-transaction.
>> But now there is also the issue of if and how to get the two sqlite 
>> files to commit atomically.
>> My current stance on this is that I will not attempt to make the 
>> commit to the two sqlite files
>> 100% atomic. This would either require at least two sqlite 
>> transaction commits/writes for every
>> one imm-transaction, or require the utilization of some open 2pc 
>> interface in sqlite if that
>> is available. I believe there is some kind of callback hook availble 
>> for sqlite transaction
>> prepare. But I will not attempt to use that for now. The basic idea 
>> is instead to keep
>> one primary PBE and then add a slave PBE. To make each 
>> imm-transaction commit with
>> syncronization between the two PBEs before sqlite commit, minimizing 
>> the probability that
>> one sqlite instance succeeds in commiting while the other sqlite 
>> instance fails.
>> Still they may of course diverge, due to file system problems or 
>> crashes.
>> The first thing to note here is that a cluster start, after such a 
>> broken PBE commit,
>> will be handled by the loading arbitration and that arbitration will 
>> choose to load
>> from the sqlite file that succeeded in the commit (the latest file) 
>> as long as it
>> is available. And even if the cluster restart should come up with one 
>> SC with the
>> file that failed to commit the last ccb and timeout in waiting for 
>> the other and
>> thus load from the older file, then there is also no problem because 
>> this only means
>> that the transaction will have been aborted. Note that the 
>> user/client of  the transaction
>> will not have obtained any answer on the outcome unless both PBEs 
>> succeeded in their commit.
>>
>> For regular CCBs, the primary PBE will prepare by executing all the 
>> sqlite calls needed to
>> build the transaction, but before comitting it sends a a message to 
>> the slave PBE asking
>> it it has received all requests to be part of the ccb and if so to 
>> also execute all
>> the sqlite calls to prepare the ccb. When/if the standby replies with 
>> ok, then the
>> primary PBE will commit its sqlite transaction and reply to immsv. 
>> The immsv will then
>> commit the ccb in imm-cluster-ram. Finaly as part of sending messages 
>> about the commit
>> of the ccb to all appliers, the slave PBE will get both completed and 
>> apply and will
>> (hopefully) commit its sqlite transaction.
>>
>> In general, the slave PBE is tightly controlled by the primary PBE 
>> (an asymetric solution).
>> For class management and PRTO/PRTA handling the solution is slightly 
>> different.
>> In general I have also tried to leverage from existing mechanisms, 
>> such as making the
>> slave PBE an applier of all config classes, thus avoiding additional 
>> and unnecessary
>> distributed messaging for the payload of a ccb.
>>
>> changeset 131d635a526a0e894245ecb08b630268f9035976
>> Author:    Anders Bjornerstedt <[email protected]>
>> Date:    Fri, 19 Jul 2013 08:39:25 +0200
>>
>>     IMM: 2PBE loading (test patch-1) [#21]
>>
>>     This is a testpatch containing a test version of the 2PBE loading 
>> mechanism.
>>     This patch will not be pushed. The intent with this patch is to 
>> allow
>>     testing and obtaining feedback on the 2PBE loading.
>>
>> This firt patch has aready been sent out once earlier for pre-review.
>> Keep in mind that it is for cluster restarts that all this is intended.
>> So in one sense this is the most important patch to be as reliable as 
>> possible.
>> --------------------------------------------------------------
>>
>> changeset f240bedd6aa2b92bdea38d1c6532ac8097ac9444
>> Author:    Anders Bjornerstedt <[email protected]>
>> Date:    Fri, 19 Jul 2013 08:45:07 +0200
>>
>>     IMM: 2PBE ccb-handling (test patch-2) [#21]
>>
>>     This is a testpatch containing a test version of the 2PBE 
>> ccb-handling. This
>>     patch will not be pushed. The intent with this patch is to allow 
>> testing and
>>     obtaining feedback on the 2PBE ccb-handling. This test-patch goes 
>> on top of
>>     the "2PBE loading" testpatch.
>>
>> Contains the process management changes to start and restart 2 PBEs, 
>> including failover
>> handling. Plus the data flow solution for CCBs. THe detailed 
>> synhronisation of ccb commit
>> between the two PBEs is still missing.
>> -------------------------------------------------------------
>>
>> changeset 421acf69d0765d0860bfe726b8f1ba1ccb519e6b
>> Author:    Anders Bjornerstedt <[email protected]>
>> Date:    Fri, 19 Jul 2013 08:55:23 +0200
>>
>>     IMM: 2PBE class-create/delete/schema change handling (test 
>> patch-3) [#21]
>>
>>     This is a testpatch containing a test version of the 2PBE 
>> class-handling.
>>     This patch will not be pushed. The intent with this patch is to 
>> allow
>>     testing and obtaining feedback on the 2PBE handling of imm 
>> classes. This
>>     test-patch goes on top of the "2PBE ccb-handling" testpatch.
>>
>> Provides 2PBE persistification of class create/delete/schema-change.
>> Extends the existing PBE oultion that uses an admin-op from immsv to PBE
>> so that the primary PBE invokes the same admin-op towards slave.
>>
>> ---------------------------------------------------------------
>>
>> changeset d6f91100b51e2b7215bf5fab95e10d0c069d14db
>> Author:    Anders Bjornerstedt <[email protected]>
>> Date:    Fri, 19 Jul 2013 09:10:43 +0200
>>
>>     IMM: 2PBE PRTO-create handling (test patch-4) [#21]
>>
>>     This is a testpatch containing a test version of the 2PBE 
>> handling of
>>     creates of persistent runtime objects (PRTOs). This patch will 
>> not be
>>     pushed. The intent with this patch is to allow testing and obtaining
>>     feedback on the 2PBE handling of PRTO creates This test-patch 
>> goes on top of
>>     the "2PBE class-handling" testpatch.
>>
>> Provides 2PBE persistification of PRTO create.
>> In this case the immsv directly sends the same payload callbacks to 
>> both the primary
>> and slave PBE. The primary will invoke an admin-op towards the slave 
>> to syncronize
>> (not implemented).
>>
>> changeset 416b82d2d116c2a9ede63b507c72c10ee2126f42
>> Author:    Anders Bjornerstedt <[email protected]>
>> Date:    Fri, 19 Jul 2013 09:16:17 +0200
>>
>>     IMM: 2PBE PRTO-delete handling (test patch-5) [#21]
>>
>>     This is a testpatch containing a test version of the 2PBE 
>> handling of
>>     deletes of persistent runtime objects (PRTOs). This patch will 
>> not be
>>     pushed. The intent with this patch is to allow testing and obtaining
>>     feedback on the 2PBE handling of PRTO deletes This test-patch 
>> goes on top of
>>     the "2PBE PRTO-create" testpatch.
>>
>> Provides 2PBE persistification of PRTO delete. This operation is 
>> quite different
>> from PRTO create because PRTO delete can in general be the delete of 
>> a subtree,
>> i.e. several PRTOs that have to be deleted as one transaction. Again 
>> the immsv
>> sends the same messages that it sent to the PBE, now also to the 
>> slave PBE.
>>
>> changeset 894df553ef7e1b829db43329a5bafe4ce7fb2cbd
>> Author:    Anders Bjornerstedt <[email protected]>
>> Date:    Fri, 19 Jul 2013 09:20:45 +0200
>>
>>     IMM: 2PBE PRTA-update handling (test patch-6) [#21]
>>
>>     This is a testpatch containing a test version of the 2PBE 
>> handling of
>>     updates of persistent runtime attributes (PRTAs). PRTAs can exist 
>> in either
>>     PRTOs or config objects. This patch will not be pushed. The 
>> intent with this
>>     patch is to allow testing and obtaining feedback on the 2PBE 
>> handling of
>>     PRTA updates This test-patch goes on top of the "2PBE PRTO-delete"
>>
>> Provides 2PBE persistification of PRTA updates. This more similar to 
>> PRTO create
>> than it is PRTO delete, but PRTA updaes can be an update in a config 
>> object.
>>
>>
>>
>> Complete diffstat:
>> ------------------
>>   osaf/libs/agents/saf/imma/imma_oi_api.c          |    4 +-
>>   osaf/libs/agents/saf/imma/imma_proc.c            |   34 +++-
>>   osaf/libs/common/immsv/immpbe_dump.cc            |   62 ++++++-
>>   osaf/libs/common/immsv/immsv_evt.c               |   55 ++++++-
>>   osaf/libs/common/immsv/include/immpbe_dump.hh    |    6 +-
>>   osaf/libs/common/immsv/include/immsv_api.h       |   17 +-
>>   osaf/libs/common/immsv/include/immsv_evt.h       |   18 +-
>>   osaf/libs/common/immsv/include/immsv_evt_model.h |    4 +
>>   osaf/services/saf/immsv/immd/immd_amf.c          |    5 +-
>>   osaf/services/saf/immsv/immd/immd_cb.h           |    6 +-
>>   osaf/services/saf/immsv/immd/immd_db.c           |    2 +
>>   osaf/services/saf/immsv/immd/immd_evt.c          |   82 ++++++++++-
>>   osaf/services/saf/immsv/immd/immd_main.c         |   40 +++++-
>>   osaf/services/saf/immsv/immd/immd_proc.c         |  229 
>> +++++++++++++++++++++++++++++-
>>   osaf/services/saf/immsv/immd/immd_proc.h         |    3 +-
>>   osaf/services/saf/immsv/immd/immd_sbevt.c        |   23 ++-
>>   osaf/services/saf/immsv/immloadd/imm_loader.cc   |  315 
>> ++++++++++++++++++++++++++++++++++++------
>>   osaf/services/saf/immsv/immloadd/imm_loader.hh   |    8 +-
>>   osaf/services/saf/immsv/immloadd/imm_pbe_load.cc |  196 
>> ++++++++++++++++++++++++-
>>   osaf/services/saf/immsv/immnd/ImmModel.cc        |  333 
>> ++++++++++++++++++++++++++++++++++---------
>>   osaf/services/saf/immsv/immnd/ImmModel.hh        |   17 +-
>>   osaf/services/saf/immsv/immnd/ImmSearchOp.cc     |    5 -
>>   osaf/services/saf/immsv/immnd/immnd_cb.h         |    6 +-
>>   osaf/services/saf/immsv/immnd/immnd_evt.c        |  345 
>> +++++++++++++++++++++++++++++++++++++++++++---
>>   osaf/services/saf/immsv/immnd/immnd_init.h       |    8 +-
>>   osaf/services/saf/immsv/immnd/immnd_main.c       |    3 +-
>>   osaf/services/saf/immsv/immnd/immnd_proc.c       |  246 
>> +++++++++++++++++++++++++++------
>>   osaf/services/saf/immsv/immpbed/immpbe.cc        |   77 +++++----
>>   osaf/services/saf/immsv/immpbed/immpbe.hh        |    4 +
>>   osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |  677 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>>  
>>
>>   tests/immsv/implementer/applier.c                |  121 
>> ++++++++++++---
>>   31 files changed, 2559 insertions(+), 392 deletions(-)
>>
>>
>> Testing Commands:
>> -----------------
>>
>>
>> Testing, Expected Results:
>> --------------------------
>> The basic normal positive use cases should work,
>> including the persistification of data to *both* pbe files.
>>
>> Most negative test cases (failover, killing of processes) should also.
>>
>> The main risk to watch for is when/if a PBE restarts with '--recover'
>> which means it simply re-attaches to the sqlite file without 
>> regenerating
>> a fresh file dumped from imm-ram, then there is a risk that there could
>> have been introduced a discrepancy between the files. In particular, the
>> last transaction being processed at the time of the crash could have
>> commited at the other PBE, yet have been rolled back at the restarted 
>> PBE.
>>
>> Persistent writes are blocked as long as not both PBEs are available.
>>
>> Conditions of Submission:
>> -------------------------
>> These patches will not be pushed.
>>
>>
>> Arch      Built     Started    Linux distro
>> -------------------------------------------
>> mips        n          n
>> mips64      n          n
>> x86         n          n
>> x86_64      n          n
>> powerpc     n          n
>> powerpc64   n          n
>>
>>
>> Reviewer Checklist:
>> -------------------
>> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>>
>>
>> Your checkin has not passed review because (see checked entries):
>>
>> ___ Your RR template is generally incomplete; it has too many blank 
>> entries
>>      that need proper data filled in.
>>
>> ___ You have failed to nominate the proper persons for review and push.
>>
>> ___ Your patches do not have proper short+long header
>>
>> ___ You have grammar/spelling in your header that is unacceptable.
>>
>> ___ You have exceeded a sensible line length in your 
>> headers/comments/text.
>>
>> ___ You have failed to put in a proper Trac Ticket # into your commits.
>>
>> ___ You have incorrectly put/left internal data in your comments/files
>>      (i.e. internal bug tracking tool IDs, product names etc)
>>
>> ___ You have not given any evidence of testing beyond basic build tests.
>>      Demonstrate some level of runtime or other sanity testing.
>>
>> ___ You have ^M present in some of your files. These have to be removed.
>>
>> ___ You have needlessly changed whitespace or added whitespace crimes
>>      like trailing spaces, or spaces before tabs.
>>
>> ___ You have mixed real technical changes with whitespace and other
>>      cosmetic code cleanup changes. These have to be separate commits.
>>
>> ___ You need to refactor your submission into logical chunks; there is
>>      too much content into a single commit.
>>
>> ___ You have extraneous garbage in your review (merge commits etc)
>>
>> ___ You have giant attachments which should never have been sent;
>>      Instead you should place your content in a public tree to be 
>> pulled.
>>
>> ___ You have too many commits attached to an e-mail; resend as threaded
>>      commits, or place in a public tree for a pull.
>>
>> ___ You have resent this content multiple times without a clear 
>> indication
>>      of what has changed between each re-send.
>>
>> ___ You have failed to adequately and individually address all of the
>>      comments and change requests that were proposed in the initial 
>> review.
>>
>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>>
>> ___ Your computer have a badly configured date and time; confusing the
>>      the threaded patch review.
>>
>> ___ Your changes affect IPC mechanism, and you don't present any results
>>      for in-service upgradability test.
>>
>> ___ Your changes affect user manual and documentation, your patch series
>>      do not contain the patch that updates the Doxygen manual.
>>
>


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to