Can I have some feedback on this series? Thanks, Hans On 02/28/2014 08:54 AM, Hans Feldt wrote: > Summary: Correct AMF support for TERM-FAILED > Review request for Trac Ticket(s): 538 > Peer Reviewer(s): Praveen, Nags, Hans N > Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> > Affected branch(es): All > Development branch: default > > -------------------------------- > Impacted area Impact y/n > -------------------------------- > Docs n > Build system n > RPM/packaging n > Configuration files n > Startup scripts n > SAF services y > OpenSAF services n > Core libraries n > Samples n > Tests n > Other n > > > Comments (indicate scope for each "y" above): > --------------------------------------------- > > It is very important to get this into the pending releases! > > > changeset 7f72f8d9cbd64fa017a71aa73337b8d74128ade8 > Author: Hans Feldt <[email protected]> > Date: Fri, 28 Feb 2014 08:12:11 +0100 > > amfd: allow modification of node repair attributes [#538] > > To prepare for correct handling of TERMINATION-FAILED it is important > that > all the repair related attributes of the AMF system model can be > changed. > > This patch allows changing saAmfNodeAutoRepair and > saAmfNodeFailfastOnTerminationFailure and also logs such change to SAF > LOG. > > changeset 5069ae52df6a857f374c93dcee4dc364f9f4fd0a > Author: Hans Feldt <[email protected]> > Date: Fri, 28 Feb 2014 08:20:51 +0100 > > amfd: reboot node when term-failed SU [#538] > > When a component enters the TERM-FAILED presence state and if all the > repair > conditions on SG and node are true, a node reboot request is ordered. > The > comp presence state is also SAFlogged. > > changeset 785f74ff482ef8e6f644f95cd1064b2d22a86ab1 > Author: Hans Feldt <[email protected]> > Date: Fri, 28 Feb 2014 08:24:08 +0100 > > amfnd: correct term-failed behaviour [#538] > > Problem: possible split brain on application level and spec violation. > > Analysis: The AMF node director requests a comp/SU failover from the AMF > director despite that a comp is in TERM-FAILED presence state. > > Change: Correct this behavior and just disable the SU and let the AMF > director handle possible node reboot or manual repair. > > changeset f56cac35542db8d592e48c758269bb5418aced38 > Author: Hans Feldt <[email protected]> > Date: Fri, 28 Feb 2014 08:35:29 +0100 > > amfd: auto clear comp cleanup failed alarm [#538] > > > Complete diffstat: > ------------------ > osaf/services/saf/amf/amfd/comp.cc | 44 > +++++++++++++++++++++++++++++++++++++------- > osaf/services/saf/amf/amfd/include/util.h | 2 ++ > osaf/services/saf/amf/amfd/node.cc | 27 +++++++++++++++++++++++++++ > osaf/services/saf/amf/amfd/sg.cc | 4 ++++ > osaf/services/saf/amf/amfd/sgproc.cc | 38 > -------------------------------------- > osaf/services/saf/amf/amfd/sgtype.cc | 6 ++++++ > osaf/services/saf/amf/amfd/util.cc | 38 > ++++++++++++++++++++++++++++++++++++++ > osaf/services/saf/amf/amfnd/clc.cc | 3 +-- > osaf/services/saf/amf/amfnd/su.cc | 1 - > osaf/services/saf/amf/amfnd/susm.cc | 45 > +++++++-------------------------------------- > 10 files changed, 122 insertions(+), 86 deletions(-) > > > Testing Commands: > ----------------- > > Case 1: > ============ > 2 node cluster, amf demo and the following script run on SC1 (active SC and > active demo): > > immcfg -f AppConfig-2N.xml > amf-adm unlock-in safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 > amf-adm unlock-in safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 > amf-adm unlock safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 > amf-adm unlock safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 > sleep 2 > > immcfg -a saAmfSGAutoRepair=1 safSg=AmfDemo,safApp=AmfDemo1 > immcfg -a saAmfNodeAutoRepair=1 safAmfNode=SC-1,safAmfCluster=myAmfCluster > immcfg -a saAmfNodeFailfastOnTerminationFailure=1 > safAmfNode=SC-1,safAmfCluster=myAmfCluster > immcfg -a saAmfNodeAutoRepair=1 safAmfNode=SC-2,safAmfCluster=myAmfCluster > immcfg -a saAmfNodeFailfastOnTerminationFailure=1 > safAmfNode=SC-2,safAmfCluster=myAmfCluster > > pkill demo > > Case 2: > =========== > The same but the saAmfSGAutoRepair=0 and admin repair of SU > > > Testing, Expected Results: > -------------------------- > > Case 1: > =============== > SC1 rebooted > demo failed over to SC2 > "component cleanup failed" alarm raised and cleared > New SAF LOGs to visualize important changes: > > 80 08:29:56 02/28/2014 NO safApp=safAmfService "CCB 3 Modified > safSg=AmfDemo,safApp=AmfDemo1 > 81 08:29:56 02/28/2014 NO safApp=safAmfService > "safSg=AmfDemo,safApp=AmfDemo1 saAmfSGAutoRepair changed to 1 > 82 08:29:56 02/28/2014 NO safApp=safAmfService "CCB 4 Modified > safAmfNode=SC-1,safAmfCluster=myAmfCluster > 83 08:29:56 02/28/2014 NO safApp=safAmfService > "safAmfNode=SC-1,safAmfCluster=myAmfCluster saAmfNodeAutoRepair changed to 1 > 84 08:29:56 02/28/2014 NO safApp=safAmfService "CCB 5 Modified > safAmfNode=SC-1,safAmfCluster=myAmfCluster > 85 08:29:56 02/28/2014 NO safApp=safAmfService > "safAmfNode=SC-1,safAmfCluster=myAmfCluster > saAmfNodeFailfastOnTerminationFailure changed to 1 > 86 08:29:57 02/28/2014 NO safApp=safAmfService "CCB 6 Modified > safAmfNode=SC-2,safAmfCluster=myAmfCluster > 87 08:29:57 02/28/2014 NO safApp=safAmfService > "safAmfNode=SC-2,safAmfCluster=myAmfCluster saAmfNodeAutoRepair changed to 1 > 88 08:29:57 02/28/2014 NO safApp=safAmfService "CCB 7 Modified > safAmfNode=SC-2,safAmfCluster=myAmfCluster > 89 08:29:57 02/28/2014 NO safApp=safAmfService > "safAmfNode=SC-2,safAmfCluster=myAmfCluster > saAmfNodeFailfastOnTerminationFailure changed to 1 > 90 08:29:57 02/28/2014 NO safApp=safAmfService > "safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 PresenceState > RESTARTING => TERMINATION_FAILED > 91 08:29:57 02/28/2014 NO safApp=safAmfService "Ordering reboot of > 'safAmfNode=SC-1,safAmfCluster=myAmfCluster' as repair action > > > Case 2: > ================= > > Node not rebooted (as expected), repair does not fully work (yet): > > Feb 28 08:45:41 SC-1 local0.notice osafimmnd[382]: NO Ccb 6 COMMITTED > (immcfg_SC-1_663) > Feb 28 08:45:41 SC-1 user.notice amf_demo[638]: exiting (caught term signal) > Feb 28 08:45:41 SC-1 local0.notice osafamfnd[447]: NO > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to > 'avaDown' : Recovery is 'componentRestart' > Feb 28 08:45:41 SC-1 local0.notice osafamfnd[447]: NO Cleanup of > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' failed > Feb 28 08:45:41 SC-1 local0.notice osafamfnd[447]: NO Reason:'Exec of script > success, but script exits with non-zero status' > Feb 28 08:45:41 SC-1 local0.notice osafamfnd[447]: NO Exit code: 1 > Feb 28 08:45:41 SC-1 local0.warn osafamfnd[447]: WA > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State > RESTARTING => TERMINATION_FAILED > Feb 28 08:45:41 SC-1 local0.notice osafamfnd[447]: NO > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATED => > TERMINATION_FAILED > Feb 28 08:45:43 SC-1 local0.notice osafamfnd[447]: NO Repair request for > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > Feb 28 08:45:43 SC-1 local0.notice osafamfnd[447]: NO > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State TERMINATION_FAILED > => UNINSTANTIATED > > That the SU stays uninstantiated yet enabled: > > 88 08:45:41 02/28/2014 NO safApp=safAmfService > "safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 PresenceState > RESTARTING => TERMINATION_FAILED > 89 08:45:41 02/28/2014 NO safApp=safAmfService > "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 OperState ENABLED => DISABLED > 90 08:45:43 02/28/2014 NO safApp=safAmfService "Admin op "REPAIRED" > initiated for 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', invocation: > 73014444033 > 91 08:45:43 02/28/2014 NO safApp=safAmfService > "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 PresenceState TERMINATION_FAILED => > UNINSTANTIATED > 92 08:45:43 02/28/2014 NO safApp=safAmfService "Admin op done for > invocation: 73014444033, result 1 > 93 08:45:43 02/28/2014 NO safApp=safAmfService > "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 OperState DISABLED => ENABLED > > even though the repair succeeds > > > Conditions of Submission: > ------------------------- > Ack from reviewers > > > Arch Built Started Linux distro > ------------------------------------------- > mips n n > mips64 n n > x86 n n > x86_64 y y > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > ------------------- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > ___ Your computer have a badly configured date and time; confusing the > the threaded patch review. > > ___ Your changes affect IPC mechanism, and you don't present any results > for in-service upgradability test. > > ___ Your changes affect user manual and documentation, your patch series > do not contain the patch that updates the Doxygen manual. > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-devel >
------------------------------------------------------------------------------ Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
