The configuration of the capability of IMMNDs to tolerate absence of IMMD is 
proposed to be covered by a new environment variable to be set in immd.conf:

# If the immsv is to allow absence of the IMMD service-id (indefinitely)
# then the config parameter below is to be commented in. Absent IMMD is
# part of the OpenSAF feature "Hydra" which has the goal of increasing
# OpenSAFs resilience in the face of both active and standby SC going down.
# Both SCs being absent implies that all OpenSAF director services are absent
# indefinitely, until an SC is re-established. Prior to the Hydra enhancement,
# departure of both SCs always resulted in a cluster restart. With the Hydra
# enhancement, payloads continue to provide reduced and limited service untill
# an SC-active is re-established. For the IMM service, Hydra is configured by
# commenting in the environment variable below.
# Support for absent IMMD is incompatible with 2PBE. If both are configured
# then 2PBE will win and the absence of IMMD feature will be ignored. An error
# message is printed in this case to the syslog at startup.
# Allowing absent immd is a configuration choice impacting all OpenSAF
# services, not just the immsv. If it is to be allowed then it is not
# sufficient to only configure the immsv for absent immd. 
# The level of service that is provided during absent SC depends on the
# particular service. In the case of the IMM service, the service provided
# during IMMD absence is in essence only the reading of config data. 
#export IMMSV_ABSENT_IMMD_ALLOWED=1



---

** [tickets:#1232] Imm: Support for indefinite absence of IMMDs (Hydra)**

**Status:** assigned
**Milestone:** future
**Created:** Tue Dec 09, 2014 10:47 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Dec 09, 2014 12:31 PM UTC
**Owner:** Anders Bjornerstedt

This ticket tracks the IMM part of the general ticket (#1132) for supporting 
indefinite absence of SCs (also called HydraV1). The overall goal will of 
course be to minimize the incapacitated "headless" time period. But that second 
part, of how to quickly re-locate and recover an SC, is not really a task for 
the IMM service and thus not an issue for a complete Hydra solution for IMM. 
This is why this ticket is tagged 'Hydra' and not 'HydraV1'.

Ticket #1132 only describes one part of a complete solution to a problem, 
namely how to handle and recover from the loss of SC service. The part of how, 
and how quickly, to recover SC functionality (we can call it HydraV2) is not 
tracked by #1132. It then makes no sense for any user to enable the Hydra 
feature (should it be delivered) as long as *only* HydraV1 is provided. It then 
also makes little sence to deliver only #1132 to an OpenSAF release, since it 
only constitutes half a solution.  

This ticket focuses on allowing the IMM service to "survive", with reasonably 
consistent behavior, in the face of having no director (IMMD), for an 
indefinite period of time. As soon as an SC returns, i.e. as soon as the IMMD 
service returns, then the IMM service will normalize. This *is* covered by this 
ticket.

During the absense of IMMD, imm service will of course be severely degraded. 
The only service that should be expected *indefinitely* is read access to imm 
config data and read access to class descriptions. 

Config data is to be regarded as stable and original data in the imm. The 
service that is configured by that data may not always be able to realize the 
intention of that config data, but that does not alter the primary nature of 
the intention of the config data. So config data can be reliably provided 
during absence of IMMD and read by clients, where these clients can be 
confident that the config data honestly reflects the *intended* configuration. 
But intended may not be the same as actual.

The actual configuration of an OI/service is to be reflected in runtime data. 
Access to *cached* runtime data should be available for a short period of grace 
after loss of IMMD (see ticket #1156), but not indefinitely. Indefinite access 
to cached runtime data would violate the semantic contract for runtime data, 
which is to reflect the state of the OI/service that provides that data. Cached 
runtime data is never to be seen as original/raw data. Cached runtime data is 
always a copy (or a reflection) of some state that is actually part of the OI 
or service that owns them. If the OI/service is down or unreachable then the 
frozen state of the cached runtime data is inevitably going to degenerate in 
quality. That is, the cached runtime data will be reflecting an increasingly 
false picture of the actual state of the service. The degre of falsehood will 
increase with time, but it is also probable that the initial event of loss of 
SC could have a major impact on the *actual* state of that se
 rvice. This would definitely be the case where that service *only* executes at 
an SC (two tier services).

Pure runtime attributes will obviosly not be readable when the OI is detached, 
since this relies on a fetch of such values from the OI. Even for OIs that do 
not reside at SCs, a fetch is impossible because it uses the global fevs 
communication that goes via the IMMD. 

Access to persisten runtime data will work indefinitely the same as for config 
data. But I will repeat here that persistent runtime data should not be used. 
It is a flawed hybrid cocept with an unclear purpose. It is persistent yet not 
handled transactionally. It can be mutated via both the OI and the OM 
interface, OM side only for PRTAs that reside in config objects and only at 
config object creation time. It is this yet that, only this except that, it is 
simply a flawed concept with no clearly intended purpose and it is frequently 
missunderstood by applications generating trouble reports and even causing 
backwrds compatibility issues relative to errors. But they will be readable 
during the absence of IMMD.

Access to class-descriptions is available. Such requests are local reads of 
data that is as stable as config data.

Admin-operations will NOT work during absence of IMMD since the request message 
goes over fevs.

None of the remaining and state mutating operations, ccb-handling, 
admin-owner-handling, implementer-handling, are available.

All OIs are incapacitated during the absence of IMMD. For an OI, the 
dissapearance of IMMD service will initially appear indistinguishable from the 
dissapearance (restart) of the local IMMND. The OI will get ERR_BAD_HANLDE and 
should then enter a retry loop of obtaining a fresh OI-handle. But subsequent 
to this we have both a divergence and a design choice to make. The divergence 
is that the absence of IMMD is to be tolerated indefinitely, while the absence 
of a locally restarted IMMND has a declared upper sanity limit of 60 seconds 
(max sync time). So it is certainly the case, that at least some, probably 
manny OIs, do not have an indefinite wait loop arround obtaining a new 
oi-handle.

It is actually possible for the local IMMND to agree to the 
oi-handle-initialize request. This request in itself is purely node local. The 
problem is that the very next operation that the typical OI will request is 
saImmOiImplementerSet. That operation goes via fevs and is thus impossible to 
provide during absence of IMMD. The OI application may of course have a 
retry-loop here also. But the main problem here is that the OI is unlikely to 
be prepared/coded to tolerate indefinite wait arround either 
saImmOiInitialize_2 or saImmOiImplementerSet. Another point is that even if the 
local IMMND accepts the oi-handle-initialize, there is nothing the OI can do 
with that handle except try to set implementer. So currently I am inclined to 
keep OI clients waiting on handle-initialize indefinitely since it is more 
likely that the OI already has logic for tolerating a wait of 60 seconds 
arround oi-handle-initialize.





---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to