Re: [devel] [PATCH 1 of 1] IMM: Update immsv/README describing imm enhancements in 4.7 [#1499]

Neelakanta Reddy Mon, 05 Oct 2015 03:05:30 -0700

Hi AndersBj,

Reviewed the patch.
Ack.


Minor comment with [Neel]

/Neel.

On Monday 05 October 2015 01:56 PM, Anders Bjornerstedt wrote:
>   osaf/services/saf/immsv/README |  309 
> ++++++++++++++++++++++++++++++++++++++++-
>   1 files changed, 306 insertions(+), 3 deletions(-)
>
>
> diff --git a/osaf/services/saf/immsv/README b/osaf/services/saf/immsv/README
> --- a/osaf/services/saf/immsv/README
> +++ b/osaf/services/saf/immsv/README
> @@ -2378,6 +2378,81 @@ Bit 5 controls OpenSAF4.5 protocols allo
>   Bit 6 controls OpenSAF4.6 protocols allowed or not (normally on/1).
>   
>   
> +Remove unnecessary detour of accessorGet into ImmModel::searchInitialize 
> (4.7)
> +==============================================================================
> +http://sourceforge.net/p/opensaf/tickets/674/
> +
> +A minor enhancement that refactored some code in ImmModell:accessorGet and
> +ImmModel::searchInitialize.
> +
> +
> +immcfg should support ccbObjModify after ccbObjCreate in same ccb (4.7)
> +=======================================================================
> +http://sourceforge.net/p/opensaf/tickets/1283
> +
> +In "explicit commit mode" the immcfg command allows a user to build
> +up multi-operation CCBs. The IMM API in general allows a CCB to have
> +an object create operation later followed by an object modify operation
> +on the object that was created in the same CCB. But due to a limitation
> +in the implementation of the immcfg command/tool this sequence was not
> +supported by immcfg. This enhancement removed that limitation.
> +
> +
> +Abort non-critical CCBs when implementer is disconnected (4.7)
> +==============================================================
> +http://sourceforge.net/p/opensaf/tickets/1391
> +
> +During the buildup of a CCB, i.e. a CCB where the OM client has successfully
> +added one or more operations to the CCB, but then the OM client lingers or
> +performs some other task not related to the open CCB; if an OI involved in
> +the open and idle CCB detaches, then the CCB is doomed to be aborted. Only
> +when the OM client invoked the next request was the abort of the CCB 
> processed.
> +
> +One problem with this was that a restart of the detached OI would have to 
> wait
> +for the CCB to get aborted before it could attach, which could mean waiting
> +indefinitely on some oblivious om-client to act.
> +
> +This enhancement fixes so that instead of having to wait for the OM client to
> +trigger the abort and cleanup of the CCB when it invokes the next operation,
> +the imm server can immediately process the abort as triggered by the OI 
> detach.
> +
> +
> +Periodically audit the PBE imm.db file (4.7)
> +============================================
> +http://sourceforge.net/p/opensaf/tickets/19/
> +
> +A first and very limited form of PBE file audit has been implemented in
> +OpenSAF 4.7. This due to a serious incident with reference attributes that
> +have the NO_DANGLING flag set. See critical ticket #1377.
> +
> +In principle the audit function can and should be extended to run 
> periodically
> +as a background job and to cover more consistency checks. In the lack of any
> +pro-active driver, it should at least be extended as a side effect of any 
> future
> +actual incidents/cases with inconsistency in the persistent imm data that
> +actually does occur.
> +
> +
> +Don't check for pending fevs when only updating pure runtime attributes (4.7)
> +=============================================================================
> +http://sourceforge.net/p/opensaf/tickets/1445
> +
> +The IMMND server process has a flow control mechanism to prevent the
> +IMMNDs (one at each processor of the cluster) to overload the single active
> +IMMD at the active SC with fevs requests. The message type used when an OI
> +updates its runtime attributes is conditionally a fevs message. It will 
> become
> +a fevs message for the normal case of the OI updating a cached runtime
> +attribute. But when the OI is updating a pure (non cached) runtime attribute
> +as a side effect of a processor local OM read request on this attribute, then
> +the update of the attribute is actually only done locally and is not sent
> +over fevs. Despite this the update of a pure and local runtime attribute
> +followed the same code path and ended up being pushed back with TRY_AGAIN
> +towards the OI, if fevs traffic was heavy. This push back (flow control) was
> +in this case totally unnecessary since this variant of the runtime attribute
> +update message would not be sent over fevs. This potential and unnecessary
> +delay of an update of a pure runtime attribute has been removed by this
> +enhancement.
> +
> +
>   Provide an admin-operation for aborting all non-critical CCBs (4.7)
>   ===================================================================
>   http://sourceforge.net/p/opensaf/tickets/1107
> @@ -2385,22 +2460,250 @@ http://sourceforge.net/p/opensaf/tickets
>   There may arise situations where an open CCB that is not in critical,
>   i.e. has not entered the commit protocol yet, is blocking an involved
>   service/OI from performing some other task that is more urgent and more
> -important than completing that CCB. The best example is the AMF, where
> +important than completing that CCB. If the OM client is idling and the
> +built up CCB is so far well formed (accepted by involved OIs) then the
> +OM client can linger indefinitely, particularly a human operator.
> +
> +A good example where a more urgent task may be blocked is the AMF, where
>   an si-swap will fail and cause the standby to reboot if it was involved
>   in an open CCB when the si-swap order was issued (see ticket #1105).
> -Ticket #1105 can be fixed by the AMF (active or standby) sending an
> +Ticket #1105 can be fixed by the AMF (active or standby) sending this
>   admin-operation directed at the IMM service requesting it to abort non
>   critical CCBs. The AMF can either use a synchronous admin-op or an
> -asyncronous admin-op. After the admin-operation has been invoked the AMF
> +asynchronous admin-op. After the admin-operation has been invoked the AMF
>   should allow a few seconds for the CCB to get aborted and the AMF OI to
>   get the abort callback for the CCB. That should then clear the path for
>   the AMF standby to succeed with the si-swap.
> +
>   The admin-operation for aborting non critical CCBs involves requesting the
>   operation id '202' directed at the IMM SF service object:
>   
>       immadm -o 202 safRdn=immManagement,safApp=safImmService
>   
>   
> +Use error string to classify cause for aborted CCB.(4.7)
> +========================================================
> +http://sourceforge.net/p/opensaf/tickets/744/
> +
> +For the case of a ccb-operation resulting in the return of:
> +
> +    SA_AIS_ERR_FAILED_OPERATION
> +
> +which means the CCB has been aborted, there is a need for some clients
> +to discriminate between the two possible generic categories of abort:
> +
> +     Validation-abort.
> +or
> +     Resource-abort.
> +
> +A validation-abort is the result when the operations added by the OM-client 
> to
> +the CCB constitute an *invalid* set of operations according to either
> +fundamental rules enforced by the IMM service or model specific rules 
> enforced
> +by the involved Ois relative to the *current* state of the database.
> +
> +A resource-abort is the result if there is a resource problem somewhere in 
> the
> +system such that the CCB can not be committed or further processed at this 
> time.
> +There are many kinds of resource aborts. A few examples would be:
> +
> +      Ccb is aborted because it interferes with another operation in the 
> system.
> +      Ccb is aborted due to loss of contact with some involved OI.
> +      Ccb is aborted due to loss of contact with PBE before apply request.
> +      Ccb is aborted because an OI reports that it has insufficient 
> resources.
> +
> +Note that SA_AIS_ERR_FAILED_OPERATION is the *only* error code with the 
> meaning
> +that the CCB has been aborted. The fact that the CCB has been aborted is the
> +dominating issue/fact that must be understood and coped with by any and all
> +applications using the CCB interface. The further discrimination into 
> validation
> +abort or resource abort is only relevant for *some* applications that are 
> capable
> +of handling the abort differently depending on this discrimination.
> +
> +The fundamental logical difference between validation abort and resource 
> abort
> +is that a validation abort is due to the CCB being incorrectly formed by the
> +OM-client; while a resource abort is due to problems in the server or OIs
> +making it "physically" impossible to successfully commit the this CCB even
> +though it *may* be a correctly formed CCB.
> +
[Neel]
commit this CCB
> +The possible actionable difference for the OM client between the two abort
> +categories is that a replay by the OM client of a resource aborted CCB could
> +succeed. This would be the case if the resource problem has been resolved. 
> But
> +a replay by the client of a validation aborted CCB will never succeed, except
> +for some rare cases where another CCB was interleaved and altered the 
> database
> +in such a way as to *make* the replay suddenly become a valid change relative
> +to a *new* state. This latter case should be so rare as to be ignored.
> +
> +So the bottom line is that some applications may wish to determine if an
> +aborted CCB that is replayed will have some chance of successfully comitting.
> +The application can do this by obtaining the abort category.
> +
> +This enhancement makes it possible for an application or end-user to obtain 
> the
> +abort category. The abort category is prepended as a string prefix to one or 
> more
> +error strings returned by saImmOmCcbGetErrorStrings().
> +
> +A validation aborted CCB will have an error string prefixed with:
> +
> +             "IMM: Validation abort:"
> +
> +A resource aborted CCB will have an error string prefixed with:
> +
> +              "IMM: Resource abort:"
> +
> +The prefix provides both a readable tag for human end-users and a 
> standardized
> +prefix that may be string-matched in a program or script.
> +
> +Finally there are some subtle issues that should be understood by 
> applications
> +intending to discriminate between resource abort and validation abort.
> +
> +a) A Ccb may be aborted for more than one reason. Thus a Ccb that is actually
> +not a valid Ccb (absolutely or relative to current IMM state) may run into a
> +resource problem during buildup and thus be aborted as a resource abort 
> before
> +it had the chance to be evaluated for validity.
> +
> +b) There may be more than one OI involved in a Ccb. Some OIs may reply OK
> +on validation (completed callback), some OIs may reply with validation
> +error, and some OIs may reply with resource error or default on timeout
> +resulting in resource error. The end result is that if there is one 
> validation
> +error "vote" then that will dominate the resulting abort category.
> +
> +c) Validation success or failure often depends on the current state. i.e.
> +the state of the set of involved objects *before* the attempted Ccb. Such a
> +state may be changed by some other client applying its own Ccb. So a Ccb
> +that is aborted with validation error, may later succeed due to the changed
> +prior state. But the assumption here is that the two Ccb clients are not
> +communicating with each other and so this possible case should be ignored
> +by the application.
> +
> +d) There is never any general guarantee that a Ccb that is replayed after a
> +resource abort will "sooner or later" be validated successfully and commit.
> +Thus any loop that replays a Ccb on resource abort should still limit the
> +number of retries.
> +
> +e) All error strings generated by the IMM service are prefixed with "IMM:"
> +The prefix means that the OM client can recognize error strings generated by
> +the IMM service as distinct from error strings generated by an OI.
> +
> +f) For the case where there is no error string or no prefixed error string,
> +the application may assume that it is a resource abort.
> +
> +g) If the error strings contain prefixes for both validationabort and 
> resource
> +abort, then validation abort dominates.
> +
> +
> +Add attribute definition flag SA_IMM_ATTR_DEFAULT_REMOVED (4.7)
> +===============================================================
> +http://sourceforge.net/p/opensaf/tickets/1471
> +
> +Support for removing a default value from an attribute definition in a class
> +definition has now been added. This kind of upgrade was not allowed 
> previously
> +since it is inherently a case of non backwards compatibility and can cause
> +problems for legacy applications/users expecting and relying on the default.
> +That is after all the entire point of having a default.
> +
> +This enhancement removes the restriction of not allowing the removal of a
> +default value if this new flag is set. The effect of this flag is that
> +when/if an object of the class is created with no value assigned to the
> +attribute that used to have a default but no longer has a default; then a
> +syslog message is generated noting that this attribute used to have a default
> +but no longer has a default and will in this case have no value.
> +
> +This is to assist users or troubleshooters if they get some form of problem
> +by the removal of the default. The syslog message should speed up
> +troubleshooting and prevent the creation of unnecessary tickets or trouble
> +reports.
> +
> +
> +Sync data Mbcsv check pointing can be optimized (4.7)
> +=====================================================
> +http://sourceforge.net/p/opensaf/tickets/952/
> +
> +Imm sync messages can be large. They are sent over fevs. Fevs messages are
> +handled by the active IMMD by MDS broadcasting them to all IMMNDs. Before
> +doing the broadcast, the active IMMD uses the message based checkpoint
> +service Mbcsv, to checkpoint each fevs message to the standby IMMD. This is
> +to secure that a failover or switchover of SC/IMMD will not cause any gap
> +in the fevs count for the fevs broadcasts. But if a failover actually happens
> +during a sync, then the new active IMMD will abort the sync. There is then
> +no point in ever re-broadcasting the contents of a sync message. The only
> +thing that needs to be re-broadcast is the fevs message header to close any
> +gap in the fevs count. This enhancement truncates the sync mesasages to only
> +contain the fevs header, before they are checkpointed to the standby IMMD.
> +This speeds up the sync and removes some communication load.
> +
> +
> +PBE: imm.db.XXXXXX temp files should managed in pbe subdirectory (4.7)
> +======================================================================
> +http://sourceforge.net/p/opensaf/tickets/896/
> +
> +The PBE generates the imm.db file in what should be a local tmp directory.
> +This is set by the configuration variable IMMSV_PBE_TMP_DIR in the immnd.conf
> +configuration file. By default it is /tmp.
> +
> +This enhancement of PBE creates a sub-directory to IMMSV_PBE_TMP_DIR where 
> all
> +temporary PBE files are created. This will reduce the risk of interference 
> with
> +other services and applications sharing the tmp directory. It also 
> facilitates
> +the safe cleanup of all such temporary files by a PBE that is restarted.
> +
> +
> +PBE: Detach of PBE should abort all non-critical and non-empty CCBs (4.7)
> +=========================================================================
> +http://sourceforge.net/p/opensaf/tickets/1261/
> +
> +This enhancement is related to defect ticket [#1260].
> +
> +If the PBE detaches while there are any active non-critical and non-empty 
> CCBs,
> +then such CCBs should be ABORTED, i.e. prevented from being further 
> processed.
> +
> +The abort must be done when the detach arrives over fevs. It must NOT be done
> +when the initial IMMND local PBE detach occurs, since that would make the ccb
> +state deviate locally.
> +
> +
> +Make it possible to run valgrind on osafimmnd when PBE is enabled (4.7)
> +=======================================================================
> +http://sourceforge.net/p/opensaf/tickets/1496/
> +
> +A minor enhancement removing an obstacle to executing the IMMND process under
> +valgrind when PBE is enabled.
> +
> +Notes on upgrading from OpenSAF 4.[1,2,3,4,5,6] to OpenSAF (4.7)
> +================================================================
> +OpenSAF4.7 adds new attribute flag allowing the removal of a default value
> +definitions (#1471). During a rolling upgrade from an earlier OpenSAF release
> +to the 4.7 release there will be nodes executing the older release 
> concurrently
> +with nodes executing OpenSAF 4.7. Nodes executing the earlier release will 
> not
> +recognize the new attribute flag originating from nodes executing 4.7.
> +
> +Because of this upgrade issue, the new attribute flag added in OpenSAF 4.7 is
> +not allowed unless a flag is toggled on in the opensafImmNostdFlags runtime
> +attribute in the object:
> +
> +   opensafImm=opensafImm,safApp=safImmService.
> +
> +The following is the shell command:
> +
> +        immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:64 \
> +           opensafImm=opensafImm,safApp=safImmService
> +
> +This will set bit 7 of the 'opensafImmNostdFlags' runtime attribute inside 
> the immsv.
> +Operation-id '1' invoked on the object:
> +
> + 'opensafImm=opensafImm,safApp=safImmService'
> +
> +has the meaning of 'flags-ON'. Operation-id '2' has the meaning of 
> 'flags-OFF'.
> +This flag (and possibly other relevant flags) needs to be toggled ON when 
> the upgrade
> +to OpenSAF 4.7 has been successfully completed. This would be in some final 
> step of
> +the upgrade. Any cluster start/restart of an OpenSAF4.7 system will always
> +automatically toggle on relevant flags.
> +
> +In summary:
> +
> +Bit 1 controls schema (imm class) changes allowed or not (normally off/0).
> +Bit 2 controls OpenSAF4.1 protocols allowed or not (normally on/1).
> +Bit 3 controls OpenSAF4.3 protocols allowed or not (normally on/1).
> +Bit 4 controls 2PBE oneSafe2PBE, see 2PBE feature in OpenSAF4.4 above 
> (normally off/0).
> +Bit 5 controls OpenSAF4.5 protocols allowed or not (normally on/1).
> +Bit 6 controls OpenSAF4.6 protocols allowed or not (normally on/1).
> +Bit 7 controls OpenSAF4.7 protocols allowed or not (normally on/1).
> +
>   ----------------------------------------
>   DEPENDENCIES
>   ============


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1 of 1] IMM: Update immsv/README describing imm enhancements in 4.7 [#1499]

Reply via email to