Ack. Thanks, Ramesh.
On 5/2/2014 3:40 AM, mathi.naic...@oracle.com wrote: > Summary: clm: avoid any functional processing from impl_set thread #800 > Review request for Trac Ticket(s): #800 > Peer Reviewer(s): ramesh.bet...@oracle.com; tony hart @btisystems > Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> > Affected branch(es): opensaf-4.3.x, 4.4.x, default > Development branch: <<IF ANY GIVE THE REPO URL>> > > -------------------------------- > Impacted area Impact y/n > -------------------------------- > Docs n > Build system n > RPM/packaging n > Configuration files n > Startup scripts n > SAF services y > OpenSAF services n > Core libraries n > Samples n > Tests n > Other n > > > Comments (indicate scope for each "y" above): > --------------------------------------------- > Tony hart reported an assert as mentioned in ticket #800. > The test case involves a payload extraction simultaneously > during a controller failover, in a scaled up cluster. > > The root cause is that two threads were unintendenly accessing > the same node information. > The patch avoids any CLM internal processing from within > the implementer set thread. > > Tony has tested this patch and it works fine. > > changeset 82f100696626999dbe58ee6d00cdce953d55239d > Author: Mathivanan N.P.<mathi.naic...@oracle.com> > Date: Thu, 01 May 2014 18:01:59 -0400 > > clm: process node down outside impl_set thread to avoid race [#800] The > standby CLM queues up node_downs and clears the queue for unprocessed > entries after becoming active. During role change(controller failover), > the > standby processes these entries but unintendedly from a separate thread > i.e. > from within the implementer set thread. This results in a scenario > where two > threads can try to update the same node entry. The patch serialises the > processing of node_down events during controller role change, by moving > the > node_down processing out of the implementer_set thread. In a certain > user's > setup (tony hart), this issue was reproducible where-in during a > failover, > when the new ACTIVE was processing the NODE_DOWN Of the previous active > the > main thread was processing the NODE_DOWN of a payload node. One of the > thread deleted a node that was being accessed by the another thread. The > problem here was that the node_down procsssing during failover should > not > have been done from within the implementer set thread. This was a > mistake. > This patch removes any extra processing from the implementerset thread. > The > patch is tested by tony hart @btisystems and works fine. > > changeset c3c88c3ada9ab4865ed5267acbc7ad6305933648 > Author: Mathivanan N.P.<mathi.naic...@oracle.com> > Date: Thu, 01 May 2014 18:02:18 -0400 > > clm: setup mds and mbcsv role first during role change [#800] I think > it is > best to first setup the mds and mbcsv role first before doing any > functional > processing during role change. > > > Complete diffstat: > ------------------ > osaf/services/saf/clmsv/clms/clms.h | 1 + > osaf/services/saf/clmsv/clms/clms_amf.c | 17 +++++++++++------ > osaf/services/saf/clmsv/clms/clms_evt.c | 40 > ++++++++++++++++++++++++++++++++++------ > osaf/services/saf/clmsv/clms/clms_imm.c | 18 ------------------ > 4 files changed, 46 insertions(+), 30 deletions(-) > > > Testing Commands: > ----------------- > Perform controller failover and at the same time, reboot one or more payloads. > The assert in CLM should get hit. > Note: The assert is just one manifestation of the problem. > > Testing, Expected Results: > -------------------------- > Perform controller failover and at the same time, reboot one or more payloads. > The assert in CLM should get hit. > The assert should not be observed. > The issue is observed in only tony's setup. > > Conditions of Submission: > ------------------------- > Ack from Ramesh (and/or Tony). > > Arch Built Started Linux distro > ------------------------------------------- > mips n n > mips64 n n > x86 n n > x86_64 y y > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > ------------------- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > ___ Your computer have a badly configured date and time; confusing the > the threaded patch review. > > ___ Your changes affect IPC mechanism, and you don't present any results > for in-service upgradability test. > > ___ Your changes affect user manual and documentation, your patch series > do not contain the patch that updates the Doxygen manual. > ------------------------------------------------------------------------------ Is your legacy SCM system holding you back? Join Perforce May 7 to find out: • 3 signs your SCM is hindering your productivity • Requirements for releasing software faster • Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel