Let me put it this way: The only campaign "runtime data" that needs to be 
persistified
is either the "campaign stack" for the case when there is just about to be a 
planned cluster
restart, or some kind of checkpoint of campaign state for fallback in case of 
an unplanned 
cluster restart.  The latter case, fallback in case of an unplanned cluster 
restart is not
supported today, but we have to support it in the future if OpenSAF is ever 
going to have
a chance of getting HA credentials. 

You need to have less perssitent data to become more robust.
At least less variation in the state space of that persistent data.

The reason is that currently with a huge ammount of persistent runtime data that
is not transactionally updated (between PRTOs), the space of possible persistent
states to "recover with" becomes huge and even unbounded. A failure to handle
any such state results again in a restore. Yes restore is the case *anyway* 
today,
but that is actually unnaceptable. I am talking about the way forward to avoid
restore here. Any restore translates to an essentially irrecoverable loss in 
the fight for good HA statisticks. The risk of an unplanned cluster restart is
Higher during an upgrade campaign than during normal operation. And precisely in
this vulnerable state we have designe away the recovery method of an automatic
Cluster restart. That is unacceptable if you insist on HA.

To make automatic luster restart realistic, you need to reduce the persistent 
state space that is possible after such a cluster restart, i.e. Has to be dealt
With and anticipated by software written by us. Reduction of persistent state
space is accomplished by not making things persistent that  dont need to be
Persistent. And by using transactions, which group larger wrties into atomic
updates (again reduces the state space). There is a reason transactions are used
In databases and in distributed systems. PRTOs are persistent data that is
Non transactional, a flawed concept. 

/AndersBj



-----Original Message-----
From: Anders Björnerstedt [mailto:anders.bjornerst...@ericsson.com] 
Sent: den 19 december 2013 15:26
To: Ingvar Bergström; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 0 of 1] Review Request for SMF #677

I am not worried about performance. Thats not the issue.

I am worried about robustness (service unavailability), the degeneration of 
other services.
You say you are not convinced that there is a difference in robustness.
Well I just explained the difference in robustness. The difference is 
ERR_TIMEOUTS versus ERR_TRY_AGAIN. That translates to a difference in 
robustness.

Of course what I would really like to see is that you change the SMF design so 
that it does not use PRTOs at all. You could either convert to config data.
But more fundamentally, the campaign objects dont really need to be persistent. 
As I undestand it you start with a campaign file in some XML format (thats the 
persistent start state).
This is then converted to campaign runtime objects, which are derivative 
objects that contain the Same information  pluss some runtime data that almost 
never needs to be persistent. 
For the few cases where you do need to performa a cluster restart step, you 
should be able To persistify the limited ammount of runtime data that you herer 
really need to have persistent.
That should be possible to do either in a CCB (or possibly in ONE PRTO).

/AndersBj
 

-----Original Message-----
From: Ingvar Bergström
Sent: den 19 december 2013 15:14
To: Anders Björnerstedt; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677

Campaigns I have seen have typically one or two procedures. So from SMF there 
will be one PRTO delete operation for each procedure.
Saying that, I'm not convinced the system will gain performance/robustness if 
the PBE is turned off/on (again) during the SMF commit operation.

/Ingvar

-----Original Message-----
From: Anders Björnerstedt
Sent: den 19 december 2013 14:57
To: Ingvar Bergström; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677

How many subtrees would you typically have?

If the entire campaign is ONE tree you could delete the entire campaign as one 
cascading delete.
That would actually be a better solution than deleteing lots of objects 
individually since one cascading delete will be one Sqlite wrtite/transaction. 
Of course the only reason you need to delete the persistent structure, is 
because you made it persistent. The irony is that the Persistence is a total 
waste of resources for all campaigns except where there is a step that has a 
planned cluster restart. 

The term "safe" is relative. What I am worried about is the risk that a burst 
of individual PRTO writes on the order of say 100 or more, that could hog the 
PBE indefinitely. The problem here is the queue buildup, where the application 
would get ERR_TIMEOUT instead of ERR_TRY_AGAIN. Particlarly if an SC is 
restarting and DRBD syncing at the same time. 

Yes enable of PBE will regenerate the database and that also takes time, but 
services will get TRY_AGAIN in this state when attempting persistent writes. So 
this is much better behaved and services must be Designed so that they can deal 
with this. But few services are equiped to deal well with getting repeated 
ERR_TIMEOUT (on persistent write requests).

/AndersBj


-----Original Message-----
From: Ingvar Bergström
Sent: den 19 december 2013 14:04
To: Anders Björnerstedt; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677

If we switch on PBE in completed state, then switch off PBE at start of commit, 
then again switch on PBE at end of commit.
Is this more effective than keep the PBE on during commit? I guess the IMM 
content is dumped  twice if turned off during commit?
I hope IMM is safe, so what would be safer? I thought the performance was the 
main concern. 

I don't know if it matters but SMF PTRO are deleted subtree by subtree where 
the top of the trees are the procedure objects. So there are a very limited 
number of delete operations to delete SMF internal objects.

You see no problem switching PBE on/off/on in sequence? If the user commits the 
campaign fast, the first PBE on may not be finished before PBE off and then PBE 
on are received again.

/Ingvar

-----Original Message-----
From: Anders Björnerstedt
Sent: den 19 december 2013 11:20
To: Ingvar Bergström; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677

Well the problem I am worried about is if there is alreg number of  PRTOs to be 
deleted, Then that is definitely a risky operation in itself with PBE turned 
on. 

But if you have deleted the bulk of the SMF PRTOs before the PBE is turned on, 
then no problem.
The advantage of turning off the PBE again would then be IF there may be a 
substantial number of PRTOs to be deleted, then that would be safer (less risk 
of service unavailability). 

/AndersBj

-----Original Message-----
From: Ingvar Bergström
Sent: den 19 december 2013 11:06
To: Anders Björnerstedt; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677

That's a valid question. The difference to the  existing solution is that the 
wrapup actions (if any) which are executed at commit will be executed with PBE 
turned on. 
Normally a limited number of IMM operations are defined in the campaign wrapup 
portion of the campaign. That's the reason PBE in the proposal is not turned 
off again for the commit.
To me the advantage of turning off/on PBE for the commit operation is very 
limited.

/Ingvar

-----Original Message-----
From: Anders Björnerstedt
Sent: den 19 december 2013 09:36
To: Ingvar Bergström; Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677

One question on this SMF enhancement.

I understand that PBE will be switched on towards the end of a campaign, but 
earlier than before.

Does this mean that there remains a phase of cleanup/deletes of lots of PRTOs 
to be done after testing ?
If so then *maybe* PBE should be turned off again, during the final cleanup of 
campaign PRTOs, then on again. 

/AndersBj
 

-----Original Message-----
From: Ingvar Bergstrom [mailto:ingvar.bergst...@ericsson.com]
Sent: den 19 december 2013 08:49
To: Bertil Engelholm
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 0 of 1] Review Request for SMF #677

Summary: smfd: turn on PBE in state completed Review request for Trac 
Ticket(s): 677 Peer Reviewer(s): bertil Pull request to: 
Affected branch(es): default
Development branch: default

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            n
 OpenSAF services        y
 Core libraries          n
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------
smfd: turn on PBE in state completed

changeset bf8be15d6ab7daa218e2cf530d0064a0efabf0f5
Author: Ingvar Bergstrom <ingvar.bergst...@ericsson.com>
Date:   Thu, 19 Dec 2013 08:45:32 +0100

        smfd: turn on PBE in state completed [677]

        To avoid the need of restore in case of cluster reboot, SMF turn on PBE 
in
        completed state.


Complete diffstat:
------------------
 osaf/services/saf/smfsv/smfd/SmfCampState.cc       |  14 +++++++++++---
 osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc |   2 +-
 2 files changed, 12 insertions(+), 4 deletions(-)


Testing Commands:
-----------------


Testing, Expected Results:
--------------------------


Conditions of Submission:
-------------------------


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      y          y
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance affects 
their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & 
PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance affects 
their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & 
PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to