Hi Anders,

Reviewed and tested the patch(Not able to reproduce the problem)
ACK.

minor comment:

In the patch 2  while logging include ccb ID also, for more clarification:

LOG_NO("Ccb not in correct state (%u) for Apply ignoring request", ccb->mState);


Thanks,
Neel.

On Friday 09 August 2013 07:11 PM, Anders Bjornerstedt wrote:
> Summary: IMM: CcbApply retry logic made fevs safe [#535]
> Review request for Trac Ticket(s): 535
> Peer Reviewer(s): Neel
> Pull request to:
> Affected branch(es): 4.2, 4.3, default(4.4)
> Development branch:
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
> Sorry for the previous botched review request.
>
> Note that the THIRD  patch is only for 4.3 and default/4.4
> but NOT for 4.2.
>
> changeset b4dac755b75231c4e79bf0375ee8b93163a693e4
> Author:       Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
> Date: Fri, 09 Aug 2013 08:42:01 +0200
>
>       IMM: CcbApply retry logic made fevs safe [#535]
>
>       The function ImmModel::immNotPbeWritable() is made time insensitive. 
> Root
>       cause of the problem was that this function depended on a timeout in 
> waiting
>       on ccbs in critical, but the threshold for deciding on this timeout was 
> not
>       fevs safe. That is the timeout could be flagged on some nodes but not
>       others, at the same fevs event i.e. the ccb-apply. The function instead
>       counts the *number* of ccbs that are in critical i.e. waiting only on 
> reply
>       from PBE.
>
> changeset ea7cad2f1de4cbc3811e5592c98d42cd5e7731e8
> Author:       Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
> Date: Fri, 09 Aug 2013 09:27:57 +0200
>
>       IMM: Ignore spurious/redundant ccb-apply from client [#535]
>
>       A spurious and redundant ccb-apply generated by a faulty code elsewhere
>       shall be ignored by the immnd server. Bugs in the imma library or new 
> bugs
>       introduced in the immnd server in the future should never be able to 
> cause
>       such a serious error as an inconsistent commit. The problem here was 
> that
>       the arrival of the second apply reaching the ccb already in critical at 
> some
>       processor(s) caused the ccb to get *aborted* in imm-ram while being in
>       critical. That should never be allowed to happen. A ccb in critical is
>       waiting on the commit decision from PBE and PBE alone. Patch also fixes 
> a
>       potential source of redundant apply in imma_om_api.c potentially caused 
> by
>       mutithreaded applications (incorrect usage of handle).
>
> changeset 835508b5f0917710219948b9b79da9096ebdf5c1
> Author:       Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
> Date: Fri, 09 Aug 2013 09:56:53 +0200
>
>       IMM: Correction to ccb handling in immnd_fevs_local_checks() [#535]
>
>       Patch only for 4.3 and default(4.4). NOT for 4.2 branch
>
>       The logic in immnd_fevs_local_checks() in immnd_evt.c is corrected for 
> CCB
>       related messages to use immNotPbeWritable and not immNotWritable() as 
> the
>       precheck. The former is unnecessarily strict since the intention is to 
> allow
>       ongoing ccbs a period of grace to complete before sync starts, but not 
> allow
>       new ccb-id's to be generated. The logic here also did not recognize the
>       special encoding of TRY_AGAIN needed towards the library for CcbApply.
>
>
> Complete diffstat:
> ------------------
>   osaf/libs/agents/saf/imma/imma_om_api.c   |   9 +++++++++
>   osaf/services/saf/immsv/immnd/ImmModel.cc |  34 
> ++++++++++++++++++----------------
>   osaf/services/saf/immsv/immnd/immnd_evt.c |  25 +++++++++++++++++++++----
>   3 files changed, 48 insertions(+), 20 deletions(-)
>
>
> Testing Commands:
> -----------------
> Very difficult to reproduce.
> Requires heavy ccb traffic and PBE backlog.
> Test applicaiton must have a retry loop arround ccbApply.
> Perform immnd sync regularly, for example by killing an immnd.
>
> Testing, Expected Results:
> --------------------------
> Watch for any immnd crashes.
> They should of course not happen.
> See ticket #535 for details of the symptoms.
>
> Conditions of Submission:
> -------------------------
> Ack from Neel or someone from Oracle.
>
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      n          n
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to