- **status**: assigned --> accepted
---
** [tickets:#1232] Imm: Support for indefinite absence of IMMDs (Hydra)**
**Status:** accepted
**Milestone:** future
**Created:** Tue Dec 09, 2014 10:47 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Dec 09, 2014 01:42 PM UTC
**Owner:** Anders Bjornerstedt
This ticket tracks the IMM part of the general ticket (#1132) for supporting
indefinite absence of SCs (also called HydraV1). The overall goal will of
course be to minimize the incapacitated "headless" time period. But that second
part, of how to quickly re-locate and recover an SC, is not really a task for
the IMM service and thus not an issue for a complete Hydra solution for IMM.
This is why this ticket is tagged 'Hydra' and not 'HydraV1'.
Ticket #1132 only describes one part of a complete solution to a problem,
namely how to handle and recover from the loss of SC service. The part of how,
and how quickly, to recover SC functionality (we can call it HydraV2) is not
tracked by #1132. It then makes no sense for any user to enable the Hydra
feature (should it be delivered) as long as *only* HydraV1 is provided. It then
also makes little sence to deliver only #1132 to an OpenSAF release, since it
only constitutes half a solution.
This ticket focuses on allowing the IMM service to "survive", with reasonably
consistent behavior, in the face of having no director (IMMD), for an
indefinite period of time. As soon as an SC returns, i.e. as soon as the IMMD
service returns, then the IMM service will normalize. This *is* covered by this
ticket.
During the absense of IMMD, imm service will of course be severely degraded.
The only service that should be expected *indefinitely* is read access to imm
config data and read access to class descriptions.
Config data is to be regarded as stable and original data in the imm. The
service that is configured by that data may not always be able to realize the
intention of that config data, but that does not alter the primary nature of
the intention of the config data. So config data can be reliably provided
during absence of IMMD and read by clients, where these clients can be
confident that the config data honestly reflects the *intended* configuration.
But intended may not be the same as actual.
The actual configuration of an OI/service is to be reflected in runtime data.
Access to *cached* runtime data should be available for a short period of grace
after loss of IMMD (see ticket #1156), but not indefinitely. Indefinite access
to cached runtime data would violate the semantic contract for runtime data,
which is to reflect the state of the OI/service that provides that data. Cached
runtime data is never to be seen as original/raw data. Cached runtime data is
always a copy (or a reflection) of some state that is actually part of the OI
or service that owns them. If the OI/service is down or unreachable then the
frozen state of the cached runtime data is inevitably going to degenerate in
quality. That is, the cached runtime data will be reflecting an increasingly
false picture of the actual state of the service. The degre of falsehood will
increase with time, but it is also probable that the initial event of loss of
SC could have a major impact on the *actual* state of that se
rvice. This would definitely be the case where that service *only* executes at
an SC (two tier services).
Pure runtime attributes will obviosly not be readable when the OI is detached,
since this relies on a fetch of such values from the OI. Even for OIs that do
not reside at SCs, a fetch is impossible because it uses the global fevs
communication that goes via the IMMD.
Access to persisten runtime data will work indefinitely the same as for config
data. But I will repeat here that persistent runtime data should not be used.
It is a flawed hybrid cocept with an unclear purpose. It is persistent yet not
handled transactionally. It can be mutated via both the OI and the OM
interface, OM side only for PRTAs that reside in config objects and only at
config object creation time. It is this yet that, only this except that, it is
simply a flawed concept with no clearly intended purpose and it is frequently
missunderstood by applications generating trouble reports and even causing
backwrds compatibility issues relative to errors. But they will be readable
during the absence of IMMD.
Access to class-descriptions is available. Such requests are local reads of
data that is as stable as config data.
Admin-operations will NOT work during absence of IMMD since the request message
goes over fevs.
None of the remaining and state mutating operations, ccb-handling,
admin-owner-handling, implementer-handling, are available.
All OIs are incapacitated during the absence of IMMD. For an OI, the
dissapearance of IMMD service will initially appear indistinguishable from the
dissapearance (restart) of the local IMMND. The OI will get ERR_BAD_HANLDE and
should then enter a retry loop of obtaining a fresh OI-handle. But subsequent
to this we have both a divergence and a design choice to make. The divergence
is that the absence of IMMD is to be tolerated indefinitely, while the absence
of a locally restarted IMMND has a declared upper sanity limit of 60 seconds
(max sync time). So it is certainly the case, that at least some, probably
manny OIs, do not have an indefinite wait loop arround obtaining a new
oi-handle.
It is actually possible for the local IMMND to agree to the
oi-handle-initialize request. This request in itself is purely node local. The
problem is that the very next operation that the typical OI will request is
saImmOiImplementerSet. That operation goes via fevs and is thus impossible to
provide during absence of IMMD. The OI application may of course have a
retry-loop here also. But the main problem here is that the OI is unlikely to
be prepared/coded to tolerate indefinite wait arround either
saImmOiInitialize_2 or saImmOiImplementerSet. Another point is that even if the
local IMMND accepts the oi-handle-initialize, there is nothing the OI can do
with that handle except try to set implementer. So currently I am inclined to
keep OI clients waiting on handle-initialize indefinitely since it is more
likely that the OI already has logic for tolerating a wait of 60 seconds
arround oi-handle-initialize.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets