Ack.

Thanks,
Ramesh.

On 5/2/2014 3:40 AM, mathi.naic...@oracle.com wrote:
> Summary: clm: avoid any functional processing from impl_set thread #800
> Review request for Trac Ticket(s): #800
> Peer Reviewer(s): ramesh.bet...@oracle.com; tony hart @btisystems
> Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>>
> Affected branch(es): opensaf-4.3.x, 4.4.x, default
> Development branch: <<IF ANY GIVE THE REPO URL>>
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
> Tony hart reported an assert as mentioned in ticket #800.
> The test case involves a payload extraction simultaneously
> during a controller failover, in a scaled up cluster.
>
> The root cause is that two threads were unintendenly accessing
> the same node information.
> The patch avoids any CLM internal processing from within
> the implementer set thread.
>
> Tony has tested this patch and it works fine.
>
> changeset 82f100696626999dbe58ee6d00cdce953d55239d
> Author:       Mathivanan N.P.<mathi.naic...@oracle.com>
> Date: Thu, 01 May 2014 18:01:59 -0400
>
>       clm: process node down outside impl_set thread to avoid race [#800] The
>       standby CLM queues up node_downs and clears the queue for unprocessed
>       entries after becoming active. During role change(controller failover), 
> the
>       standby processes these entries but unintendedly from a separate thread 
> i.e.
>       from within the implementer set thread. This results in a scenario 
> where two
>       threads can try to update the same node entry. The patch serialises the
>       processing of node_down events during controller role change, by moving 
> the
>       node_down processing out of the implementer_set thread. In a certain 
> user's
>       setup (tony hart), this issue was reproducible where-in during a 
> failover,
>       when the new ACTIVE was processing the NODE_DOWN Of the previous active 
> the
>       main thread was processing the NODE_DOWN of a payload node. One of the
>       thread deleted a node that was being accessed by the another thread. The
>       problem here was that the node_down procsssing during failover should 
> not
>       have been done from within the implementer set thread. This was a 
> mistake.
>       This patch removes any extra processing from the implementerset thread. 
> The
>       patch is tested by tony hart @btisystems and works fine.
>
> changeset c3c88c3ada9ab4865ed5267acbc7ad6305933648
> Author:       Mathivanan N.P.<mathi.naic...@oracle.com>
> Date: Thu, 01 May 2014 18:02:18 -0400
>
>       clm: setup mds and mbcsv role first during role change [#800] I think 
> it is
>       best to first setup the mds and mbcsv role first before doing any 
> functional
>       processing during role change.
>
>
> Complete diffstat:
> ------------------
>   osaf/services/saf/clmsv/clms/clms.h     |   1 +
>   osaf/services/saf/clmsv/clms/clms_amf.c |  17 +++++++++++------
>   osaf/services/saf/clmsv/clms/clms_evt.c |  40 
> ++++++++++++++++++++++++++++++++++------
>   osaf/services/saf/clmsv/clms/clms_imm.c |  18 ------------------
>   4 files changed, 46 insertions(+), 30 deletions(-)
>
>
> Testing Commands:
> -----------------
> Perform controller failover and at the same time, reboot one or more payloads.
> The assert in CLM should get hit.
> Note: The assert is just one manifestation of the problem.
>
> Testing, Expected Results:
> --------------------------
> Perform controller failover and at the same time, reboot one or more payloads.
> The assert in CLM should get hit.
> The assert should not be observed.
> The issue is observed in only tony's setup.
>
> Conditions of Submission:
> -------------------------
> Ack from Ramesh (and/or Tony).
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      y          y
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to