An opposite strategy for improving SMF would be to work on removing the
need for escalating an unplanned cluster restart to a restore. In that
case having PBE enabled would make sense since all that incremental writing
by the PBE is not pointless.
But if that approach is taken, then SMF should redesign is PRTOs to instead
be config objects and use CCBs for writing to the persistent campaign
objects. The SMF service can still "protect" these objects from accidantal
operator mistaken writes, byu setting admin-owner permanently to
osafSmfService. Avoiding the use of PRTOs is very desirable because the
PRTO concept is fundamentally flawed.
---
** [tickets:#128] SMF: Remove SMFs extensive use of persistent runtime
attributes.**
**Status:** unassigned
**Created:** Mon May 13, 2013 08:56 AM UTC by Ingvar Bergström
**Last Updated:** Thu Dec 19, 2013 10:50 AM UTC
**Owner:** nobody
http://devel.opensaf.org/ticket/2432
SMF makes extensive use of persistent runtime attributes in the imm.
The SMF standard apparently does not explicitly state that these
runtime attributes are persistent. which in my world means they
are not persistent according to the standard, but which is
nevertheless interpreted by the OpenSAF SMF implementation as that
they should be persistent.
As explained in ticket #2431, the way that SMF uses persistent
runtime attributes actually causes more problems than it solves.
The only problem it "solves" is that for the rare case when a
campaign involves a planned cluster restart, then the SMF can
persistify is campaign state (using an explicit immdump since
PBE must be disabled), in such a way that efter the planned
cluster restart step, SMF can continue the campaign.
There must be a better way for the SMF to checkpoint the necessary
campaign state.
An SMF campaign can be seen as a huge transaction. You want the
entire campaign to succeed atomically and persistently or
rollback atomically and persistently.
The problem with using persistent runtime attributes is that they
can not be included in any transaction (ccb in the imm).
Every single persistent runtime attribute update is a separate and
independent action.
The SMF runtime attributes should remain as non persistent runtime
attributes, reflecting the campaign state *towards* the operator,
but not be used by the SMF as a store and retrieval utility for
fetching its own campaign state after a cluster restart step
(a non runtime attribute use of runtime attributes).
A much better solution is for the SMF to create private config
data where it stores any campaign state that needs to survive
a cluster restart and does so using CCBs (atomic write
transactions).
For most campaigns, (that do not have a cluster restart step)
SMF would only store a marker saying it is not finished yet
with the campaign so any unplanned cluster restart shall escalate
to a restore.
And for the rare and exceptional campaigns that do require a
cluster restart step, it would execute a ccb storing the needed
campaign state just before the cluster restart step, in the same
place where it today invokes an immdump to checkpoint the imm
database.
With this enhancement in place, campaigns will be safer and the
imm PBE need not be disabled during campaigns.
Some campaigns that do encounter an unplanned cluster restart
would then not have to be escalated to a *restore*. Instead SMF
could in some cases perform a rollback after the unplanned
cluster restart.
Changed 16 months ago by anders
description modified (diff)
Changed 16 months ago by anders
The best way for SMF to completely avoid the need for persistent runtime
attributes,
is to not allow any step that is a cluster reboot, except if it is the last
step in
the campaign.
Any additional steps needed after the cluster reboot would have to be taken in
a new campaign.
The only problem with this is that it would require one manual step, namely for
the
operator to invoke the second campaign. The fact that the oeprator has managed
to invoke
the first campaign makes it likely but not certain that they could manage to
invoke the
second campaign.
Seeing as some operators could forget to do this, an elaboration would be to
have some
simple bridging mechanism. As it turns out, that bridging mechanism has to be
there
already anyway. SMF has to have some way of finding out, after a cluster
restart, if it
was a planned restart or not. If was not planned, it must invoke a restore.
if it was planned, then there coukld be also just enough information to point
at a
continuation campaign.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets