I dissagree with this. During an upgrade I see it as an advantage of having PBE disabled. For upgrades the goal should should be to have an unplanned cluster restart result in an automatic fallback to a checkpoint of configuration data and software. You typically dont want a restore back to the exact state where you got the Unplanned cluster restart because it is likely to happen again. So I see diable of PBE as a *feature* during upgrade campaigns.
Possibly there could be made a distinction between different kinds of campaigns But that complicates things. Note: application data that is provided by a service to clients cutside the cluster must not be stored in the IMM. Such data should preferrably not be rewound even if There is a fallback of software and configuration to an earlier state. Its not an issue just for 1PBE. It is just as relevant for 2PBE. We need to get rid of "restore" as the only way of handling an unplaned cluster restart during an upgrade campaign. That is the main thing, if you are concerned about HA. /AndersBj -----Original Message----- From: Hans Feldt Sent: den 19 december 2013 16:02 To: Ingvar Bergström; Anders Björnerstedt; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 I would prefer if PBE was enabled at system start and never touched again. That I think should be our goal. In the next release we can see if we can reach it. I guess this is only an issue with 1PBE IMM? So just moving enable to an earlier point in time seems fine to me and inline with the (my) end goal. /Hans > -----Original Message----- > From: Ingvar Bergström [mailto:ingvar.bergst...@ericsson.com] > Sent: den 19 december 2013 15:57 > To: Anders Björnerstedt; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > This patch is only about moving the point where the IMM content will > be persistent. This will gain the system robustness during the test period. > On this I think we can agree. > > What I want to know if we should let the quite few delete operations > during commit be executed with PBE on or if it will be better to temporarily > again turn off/on the PBE during the commit operation. > > The work to remove/handle the PRTO in another way is another enhancement > ticket. > > So which one would you recommend? > 1) let the (quite few) PRTO delete operations during commit execute with PBE > enabled. > 2) it is better to again turn off/on the PBE for a short time for the > few PRTO delete operations during commit > > /Ingvar > > -----Original Message----- > From: Anders Björnerstedt > Sent: den 19 december 2013 15:26 > To: Ingvar Bergström; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > I am not worried about performance. Thats not the issue. > > I am worried about robustness (service unavailability), the degeneration of > other services. > You say you are not convinced that there is a difference in robustness. > Well I just explained the difference in robustness. The difference is > ERR_TIMEOUTS versus ERR_TRY_AGAIN. That translates to a difference in > robustness. > > Of course what I would really like to see is that you change the SMF > design so that it does not use PRTOs at all. You could either convert to > config data. > But more fundamentally, the campaign objects dont really need to be > persistent. > As I undestand it you start with a campaign file in some XML format (thats > the persistent start state). > This is then converted to campaign runtime objects, which are > derivative objects that contain the Same information pluss some runtime data > that almost never needs to be persistent. > For the few cases where you do need to performa a cluster restart > step, you should be able To persistify the limited ammount of runtime data > that you herer really need to have persistent. > That should be possible to do either in a CCB (or possibly in ONE PRTO). > > /AndersBj > > > -----Original Message----- > From: Ingvar Bergström > Sent: den 19 december 2013 15:14 > To: Anders Björnerstedt; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > Campaigns I have seen have typically one or two procedures. So from > SMF there will be one PRTO delete operation for each procedure. > Saying that, I'm not convinced the system will gain > performance/robustness if the PBE is turned off/on (again) during the SMF > commit operation. > > /Ingvar > > -----Original Message----- > From: Anders Björnerstedt > Sent: den 19 december 2013 14:57 > To: Ingvar Bergström; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > How many subtrees would you typically have? > > If the entire campaign is ONE tree you could delete the entire campaign as > one cascading delete. > That would actually be a better solution than deleteing lots of > objects individually since one cascading delete will be one Sqlite > wrtite/transaction. Of course the only reason you need to delete the > persistent structure, is because you made it persistent. The irony is that > the Persistence is a total waste of resources for all campaigns except where > there is a step that has a planned cluster restart. > > The term "safe" is relative. What I am worried about is the risk that > a burst of individual PRTO writes on the order of say 100 or more, > that could hog the PBE indefinitely. The problem here is the queue buildup, > where the application would get ERR_TIMEOUT instead of ERR_TRY_AGAIN. > Particlarly if an SC is restarting and DRBD syncing at the same time. > > Yes enable of PBE will regenerate the database and that also takes > time, but services will get TRY_AGAIN in this state when attempting > persistent writes. So this is much better behaved and services must be > Designed so that they can deal with this. But few services are equiped to > deal well with getting repeated ERR_TIMEOUT (on persistent write requests). > > /AndersBj > > > -----Original Message----- > From: Ingvar Bergström > Sent: den 19 december 2013 14:04 > To: Anders Björnerstedt; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > If we switch on PBE in completed state, then switch off PBE at start of > commit, then again switch on PBE at end of commit. > Is this more effective than keep the PBE on during commit? I guess the IMM > content is dumped twice if turned off during commit? > I hope IMM is safe, so what would be safer? I thought the performance was the > main concern. > > I don't know if it matters but SMF PTRO are deleted subtree by subtree > where the top of the trees are the procedure objects. So there are a very > limited number of delete operations to delete SMF internal objects. > > You see no problem switching PBE on/off/on in sequence? If the user > commits the campaign fast, the first PBE on may not be finished before PBE > off and then PBE on are received again. > > /Ingvar > > -----Original Message----- > From: Anders Björnerstedt > Sent: den 19 december 2013 11:20 > To: Ingvar Bergström; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > Well the problem I am worried about is if there is alreg number of > PRTOs to be deleted, Then that is definitely a risky operation in itself with > PBE turned on. > > But if you have deleted the bulk of the SMF PRTOs before the PBE is turned > on, then no problem. > The advantage of turning off the PBE again would then be IF there may > be a substantial number of PRTOs to be deleted, then that would be safer > (less risk of service unavailability). > > /AndersBj > > -----Original Message----- > From: Ingvar Bergström > Sent: den 19 december 2013 11:06 > To: Anders Björnerstedt; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > That's a valid question. The difference to the existing solution is > that the wrapup actions (if any) which are executed at commit will be > executed with PBE turned on. > Normally a limited number of IMM operations are defined in the > campaign wrapup portion of the campaign. That's the reason PBE in the > proposal is not turned off again for the commit. > To me the advantage of turning off/on PBE for the commit operation is very > limited. > > /Ingvar > > -----Original Message----- > From: Anders Björnerstedt > Sent: den 19 december 2013 09:36 > To: Ingvar Bergström; Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > One question on this SMF enhancement. > > I understand that PBE will be switched on towards the end of a campaign, but > earlier than before. > > Does this mean that there remains a phase of cleanup/deletes of lots of PRTOs > to be done after testing ? > If so then *maybe* PBE should be turned off again, during the final cleanup > of campaign PRTOs, then on again. > > /AndersBj > > > -----Original Message----- > From: Ingvar Bergstrom [mailto:ingvar.bergst...@ericsson.com] > Sent: den 19 december 2013 08:49 > To: Bertil Engelholm > Cc: opensaf-devel@lists.sourceforge.net > Subject: [devel] [PATCH 0 of 1] Review Request for SMF #677 > > Summary: smfd: turn on PBE in state completed Review request for Trac > Ticket(s): 677 Peer Reviewer(s): bertil Pull request to: > Affected branch(es): default > Development branch: default > > -------------------------------- > Impacted area Impact y/n > -------------------------------- > Docs n > Build system n > RPM/packaging n > Configuration files n > Startup scripts n > SAF services n > OpenSAF services y > Core libraries n > Samples n > Tests n > Other n > > > Comments (indicate scope for each "y" above): > --------------------------------------------- > smfd: turn on PBE in state completed > > changeset bf8be15d6ab7daa218e2cf530d0064a0efabf0f5 > Author: Ingvar Bergstrom <ingvar.bergst...@ericsson.com> > Date: Thu, 19 Dec 2013 08:45:32 +0100 > > smfd: turn on PBE in state completed [677] > > To avoid the need of restore in case of cluster reboot, SMF turn on PBE > in > completed state. > > > Complete diffstat: > ------------------ > osaf/services/saf/smfsv/smfd/SmfCampState.cc | 14 +++++++++++--- > osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc | 2 +- > 2 files changed, 12 insertions(+), 4 deletions(-) > > > Testing Commands: > ----------------- > > > Testing, Expected Results: > -------------------------- > > > Conditions of Submission: > ------------------------- > > > Arch Built Started Linux distro > ------------------------------------------- > mips n n > mips64 n n > x86 n n > x86_64 y y > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > ------------------- > [Submitters: make sure that your review doesn't trigger any > checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > ___ Your computer have a badly configured date and time; confusing the > the threaded patch review. > > ___ Your changes affect IPC mechanism, and you don't present any results > for in-service upgradability test. > > ___ Your changes affect user manual and documentation, your patch series > do not contain the patch that updates the Doxygen manual. > > > ---------------------------------------------------------------------- > -------- Rapidly troubleshoot problems before they affect your > business. Most IT organizations don't have a clear picture of how > application performance affects their revenue. With AppDynamics, you > get 100% visibility into your Java,.NET, & PHP application. Start your 15- > day FREE TRIAL of AppDynamics Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.c > lktrk _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > ---------------------------------------------------------------------- > -------- Rapidly troubleshoot problems before they affect your > business. Most IT organizations don't have a clear picture of how > application performance affects their revenue. With AppDynamics, you > get 100% visibility into your Java,.NET, & PHP application. Start your > 15-day FREE TRIAL of AppDynamics Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.c > lktrk _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel