Hi, Please see questions inline:
On 20-Mar-15 1:25 AM, Shu Wang wrote: > We have a scenario when nodes lost contact for 10 seconds and rejoined, some > service units ended up in Terminating state. > > For example, the following message was seen from /var/log/messages: > NO Lost contact with 'appbox' > > We saw some service units on the same box disabled. Then we performed lock > and lock-in on the disabled service unit: > amf-adm lock safSu=amfSU2.1,safSg=amfSG2,safApp=myApp > amf-adm lock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp > > Then we tried the following commands: > amf-adm repaired safSu=amfSU2.1,safSg=amfSG2,safApp=myApp > amf-adm unlock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp > > For either repaired or unlock-in, we got the following error: > error - command timed out (alarm) > > SU state stayed as: > safSu=amfSU2.1,safSg=amfSG2,safApp=myApp > saAmfSUAdminState=LOCKED-INSTANTIATION(3) > saAmfSUOperState=ENABLED(1) > saAmfSUPresenceState=TERMINATING(4) > saAmfSUReadinessState=OUT-OF-SERVICE(1) > Which OpenSAF release are you using? What is the recovery policy of the SU? Do you see any fault reported on any component of this SU by AMF in the syslog? (like SU failover?) Also note that, Link flaps are not supported. Assuming that it is a scenario where the link is brought down (like interface down or cable plugout - all leading to socket connection loss with that node) , the ideal behaviour should be that this node leaves the cluster and cannot join without restart of OpenSAF(including network connection establishment). Thanks, Praveen > Eventually we had to stop the node and restart the node to bring things back > to normal. > > Why disabled service unit stuck at TERMINATING state? What made a service > unit stuck at TERMINATING state? > If a node is lost for a little while, what are the effects of the node lost > contact in the cluster? > How to repair the damage caused by the node lost? > > Thanks! > > Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com > Proven Partner to Communications Service Providers > > > > > ________________________________ > The information transmitted herein is intended only for the person or entity > to which it is addressed and may contain confidential, proprietary and/or > privileged material. Any review, retransmission, dissemination or other use > of, or taking of any action in reliance upon, this information by persons or > entities other than the intended recipient is prohibited. If you received > this in error, please contact the sender and delete the material from any > computer. > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for all > things parallel software development, from weekly thought leadership blogs to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
