Application has gone into termination failed state. From here on, AMF(OpenSAF) does not knows or cannot determine anything about what exactly is causing this problem with the application. Therefore manual intervention is necessary to fix the problem and restart opensaf or the node.
Mathi. -----Original Message----- From: praveen malviya Sent: Thursday, February 12, 2015 10:05 AM To: santosh satapathy; [email protected] Subject: Re: [users] amf-adm issue on amfpm On 12-Feb-15 2:21 AM, santosh satapathy wrote: > Hi, > > I have integrated one of our applications with amfpm for passive > monitoring. I executed amf-adm unlock-in and unlock command in > sequence while amfpm binary was not available. Forgot to install it to > the node. And after that, I see the su state as below and its stays > there forever, not responding to any of the amf-adm commands, reason > being "WA Admin operation is already going on " . I tried to start > the SU after installing and putting required bins but even after 1 > hour I found the same thing. How to get out of this state? > > [root@mgt-a bin]# amf-state su all > safSu=testsu,safSg=testsg,safApp=TestApp > safSu=testsu,safSg=testsg,safApp=TestApp > saAmfSUAdminState=UNLOCKED(1) > saAmfSUOperState=ENABLED(1) > saAmfSUPresenceState=TERMINATION-FAILED(7) > saAmfSUReadinessState=IN-SERVICE(2) I think it is a NPI application. If amfpm is missing, AMF will declare that component faulty. After this AMF will try to clean up the component. Here SU is marked TERM_FAILED which means AMF could not clean up the component successfully. Please see why cleanup failed. There is open ticket #538 for this reported case. However, to allow AMF to automatically perform recovery and repair whenever a SU moves to TERM_FAULED state: enable node level attribute saAmfNodeFailfastOnTerminationFailure=1 along with saAmfNodeAutoRepair and saAmfSgAutoRepair. When all these attributes are enabled. AMF will perform nodefailfast recovery whenever a Su enters TEMR_FAILED state. In this reported case SG is unstable. So AMF will not accept any admin operation like repair admin op on SU. So restore amfpm binary and reboot the node manually. When node joins the cluster again, su will get instantiated. Thanks, Praveen > [root@mgt-a bin]# > > Logs: > ===== > > Feb 11 14:38:44 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 14:38:45 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 14:38:46 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 14:38:47 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 14:38:48 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 14:38:49 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > > > > Feb 11 15:31:29 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 15:31:30 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 15:31:31 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 15:31:32 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 15:31:33 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 15:31:34 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > Feb 11 15:31:35 mgt-a osafamfd[23083]: WA Admin operation is already > going on (su'safSu=testsu,safSg=testsg,safApp=TestApp') > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
