This patch is only about moving the point where the IMM content will be persistent. This will gain the system robustness during the test period. On this I think we can agree.
What I want to know if we should let the quite few delete operations during commit be executed with PBE on or if it will be better to temporarily again turn off/on the PBE during the commit operation. The work to remove/handle the PRTO in another way is another enhancement ticket. So which one would you recommend? 1) let the (quite few) PRTO delete operations during commit execute with PBE enabled. 2) it is better to again turn off/on the PBE for a short time for the few PRTO delete operations during commit /Ingvar -----Original Message----- From: Anders Björnerstedt Sent: den 19 december 2013 15:26 To: Ingvar Bergström; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 I am not worried about performance. Thats not the issue. I am worried about robustness (service unavailability), the degeneration of other services. You say you are not convinced that there is a difference in robustness. Well I just explained the difference in robustness. The difference is ERR_TIMEOUTS versus ERR_TRY_AGAIN. That translates to a difference in robustness. Of course what I would really like to see is that you change the SMF design so that it does not use PRTOs at all. You could either convert to config data. But more fundamentally, the campaign objects dont really need to be persistent. As I undestand it you start with a campaign file in some XML format (thats the persistent start state). This is then converted to campaign runtime objects, which are derivative objects that contain the Same information pluss some runtime data that almost never needs to be persistent. For the few cases where you do need to performa a cluster restart step, you should be able To persistify the limited ammount of runtime data that you herer really need to have persistent. That should be possible to do either in a CCB (or possibly in ONE PRTO). /AndersBj -----Original Message----- From: Ingvar Bergström Sent: den 19 december 2013 15:14 To: Anders Björnerstedt; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 Campaigns I have seen have typically one or two procedures. So from SMF there will be one PRTO delete operation for each procedure. Saying that, I'm not convinced the system will gain performance/robustness if the PBE is turned off/on (again) during the SMF commit operation. /Ingvar -----Original Message----- From: Anders Björnerstedt Sent: den 19 december 2013 14:57 To: Ingvar Bergström; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 How many subtrees would you typically have? If the entire campaign is ONE tree you could delete the entire campaign as one cascading delete. That would actually be a better solution than deleteing lots of objects individually since one cascading delete will be one Sqlite wrtite/transaction. Of course the only reason you need to delete the persistent structure, is because you made it persistent. The irony is that the Persistence is a total waste of resources for all campaigns except where there is a step that has a planned cluster restart. The term "safe" is relative. What I am worried about is the risk that a burst of individual PRTO writes on the order of say 100 or more, that could hog the PBE indefinitely. The problem here is the queue buildup, where the application would get ERR_TIMEOUT instead of ERR_TRY_AGAIN. Particlarly if an SC is restarting and DRBD syncing at the same time. Yes enable of PBE will regenerate the database and that also takes time, but services will get TRY_AGAIN in this state when attempting persistent writes. So this is much better behaved and services must be Designed so that they can deal with this. But few services are equiped to deal well with getting repeated ERR_TIMEOUT (on persistent write requests). /AndersBj -----Original Message----- From: Ingvar Bergström Sent: den 19 december 2013 14:04 To: Anders Björnerstedt; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 If we switch on PBE in completed state, then switch off PBE at start of commit, then again switch on PBE at end of commit. Is this more effective than keep the PBE on during commit? I guess the IMM content is dumped twice if turned off during commit? I hope IMM is safe, so what would be safer? I thought the performance was the main concern. I don't know if it matters but SMF PTRO are deleted subtree by subtree where the top of the trees are the procedure objects. So there are a very limited number of delete operations to delete SMF internal objects. You see no problem switching PBE on/off/on in sequence? If the user commits the campaign fast, the first PBE on may not be finished before PBE off and then PBE on are received again. /Ingvar -----Original Message----- From: Anders Björnerstedt Sent: den 19 december 2013 11:20 To: Ingvar Bergström; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 Well the problem I am worried about is if there is alreg number of PRTOs to be deleted, Then that is definitely a risky operation in itself with PBE turned on. But if you have deleted the bulk of the SMF PRTOs before the PBE is turned on, then no problem. The advantage of turning off the PBE again would then be IF there may be a substantial number of PRTOs to be deleted, then that would be safer (less risk of service unavailability). /AndersBj -----Original Message----- From: Ingvar Bergström Sent: den 19 december 2013 11:06 To: Anders Björnerstedt; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 That's a valid question. The difference to the existing solution is that the wrapup actions (if any) which are executed at commit will be executed with PBE turned on. Normally a limited number of IMM operations are defined in the campaign wrapup portion of the campaign. That's the reason PBE in the proposal is not turned off again for the commit. To me the advantage of turning off/on PBE for the commit operation is very limited. /Ingvar -----Original Message----- From: Anders Björnerstedt Sent: den 19 december 2013 09:36 To: Ingvar Bergström; Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 1] Review Request for SMF #677 One question on this SMF enhancement. I understand that PBE will be switched on towards the end of a campaign, but earlier than before. Does this mean that there remains a phase of cleanup/deletes of lots of PRTOs to be done after testing ? If so then *maybe* PBE should be turned off again, during the final cleanup of campaign PRTOs, then on again. /AndersBj -----Original Message----- From: Ingvar Bergstrom [mailto:ingvar.bergst...@ericsson.com] Sent: den 19 december 2013 08:49 To: Bertil Engelholm Cc: opensaf-devel@lists.sourceforge.net Subject: [devel] [PATCH 0 of 1] Review Request for SMF #677 Summary: smfd: turn on PBE in state completed Review request for Trac Ticket(s): 677 Peer Reviewer(s): bertil Pull request to: Affected branch(es): default Development branch: default -------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- smfd: turn on PBE in state completed changeset bf8be15d6ab7daa218e2cf530d0064a0efabf0f5 Author: Ingvar Bergstrom <ingvar.bergst...@ericsson.com> Date: Thu, 19 Dec 2013 08:45:32 +0100 smfd: turn on PBE in state completed [677] To avoid the need of restore in case of cluster reboot, SMF turn on PBE in completed state. Complete diffstat: ------------------ osaf/services/saf/smfsv/smfd/SmfCampState.cc | 14 +++++++++++--- osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc | 2 +- 2 files changed, 12 insertions(+), 4 deletions(-) Testing Commands: ----------------- Testing, Expected Results: -------------------------- Conditions of Submission: ------------------------- Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel