Hi Anders, Reviewed and tested the patch(Not able to reproduce the problem) ACK.
minor comment: In the patch 2 while logging include ccb ID also, for more clarification: LOG_NO("Ccb not in correct state (%u) for Apply ignoring request", ccb->mState); Thanks, Neel. On Friday 09 August 2013 07:11 PM, Anders Bjornerstedt wrote: > Summary: IMM: CcbApply retry logic made fevs safe [#535] > Review request for Trac Ticket(s): 535 > Peer Reviewer(s): Neel > Pull request to: > Affected branch(es): 4.2, 4.3, default(4.4) > Development branch: > > -------------------------------- > Impacted area Impact y/n > -------------------------------- > Docs n > Build system n > RPM/packaging n > Configuration files n > Startup scripts n > SAF services y > OpenSAF services n > Core libraries n > Samples n > Tests n > Other n > > > Comments (indicate scope for each "y" above): > --------------------------------------------- > Sorry for the previous botched review request. > > Note that the THIRD patch is only for 4.3 and default/4.4 > but NOT for 4.2. > > changeset b4dac755b75231c4e79bf0375ee8b93163a693e4 > Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> > Date: Fri, 09 Aug 2013 08:42:01 +0200 > > IMM: CcbApply retry logic made fevs safe [#535] > > The function ImmModel::immNotPbeWritable() is made time insensitive. > Root > cause of the problem was that this function depended on a timeout in > waiting > on ccbs in critical, but the threshold for deciding on this timeout was > not > fevs safe. That is the timeout could be flagged on some nodes but not > others, at the same fevs event i.e. the ccb-apply. The function instead > counts the *number* of ccbs that are in critical i.e. waiting only on > reply > from PBE. > > changeset ea7cad2f1de4cbc3811e5592c98d42cd5e7731e8 > Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> > Date: Fri, 09 Aug 2013 09:27:57 +0200 > > IMM: Ignore spurious/redundant ccb-apply from client [#535] > > A spurious and redundant ccb-apply generated by a faulty code elsewhere > shall be ignored by the immnd server. Bugs in the imma library or new > bugs > introduced in the immnd server in the future should never be able to > cause > such a serious error as an inconsistent commit. The problem here was > that > the arrival of the second apply reaching the ccb already in critical at > some > processor(s) caused the ccb to get *aborted* in imm-ram while being in > critical. That should never be allowed to happen. A ccb in critical is > waiting on the commit decision from PBE and PBE alone. Patch also fixes > a > potential source of redundant apply in imma_om_api.c potentially caused > by > mutithreaded applications (incorrect usage of handle). > > changeset 835508b5f0917710219948b9b79da9096ebdf5c1 > Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> > Date: Fri, 09 Aug 2013 09:56:53 +0200 > > IMM: Correction to ccb handling in immnd_fevs_local_checks() [#535] > > Patch only for 4.3 and default(4.4). NOT for 4.2 branch > > The logic in immnd_fevs_local_checks() in immnd_evt.c is corrected for > CCB > related messages to use immNotPbeWritable and not immNotWritable() as > the > precheck. The former is unnecessarily strict since the intention is to > allow > ongoing ccbs a period of grace to complete before sync starts, but not > allow > new ccb-id's to be generated. The logic here also did not recognize the > special encoding of TRY_AGAIN needed towards the library for CcbApply. > > > Complete diffstat: > ------------------ > osaf/libs/agents/saf/imma/imma_om_api.c | 9 +++++++++ > osaf/services/saf/immsv/immnd/ImmModel.cc | 34 > ++++++++++++++++++---------------- > osaf/services/saf/immsv/immnd/immnd_evt.c | 25 +++++++++++++++++++++---- > 3 files changed, 48 insertions(+), 20 deletions(-) > > > Testing Commands: > ----------------- > Very difficult to reproduce. > Requires heavy ccb traffic and PBE backlog. > Test applicaiton must have a retry loop arround ccbApply. > Perform immnd sync regularly, for example by killing an immnd. > > Testing, Expected Results: > -------------------------- > Watch for any immnd crashes. > They should of course not happen. > See ticket #535 for details of the symptoms. > > Conditions of Submission: > ------------------------- > Ack from Neel or someone from Oracle. > > > Arch Built Started Linux distro > ------------------------------------------- > mips n n > mips64 n n > x86 n n > x86_64 n n > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > ------------------- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > ___ Your computer have a badly configured date and time; confusing the > the threaded patch review. > > ___ Your changes affect IPC mechanism, and you don't present any results > for in-service upgradability test. > > ___ Your changes affect user manual and documentation, your patch series > do not contain the patch that updates the Doxygen manual. > ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel