We have a scenario when nodes lost contact for 10 seconds and rejoined, some 
service units ended up in Terminating state.

For example, the following message was seen from /var/log/messages:
NO Lost contact with 'appbox'

We saw some service units on the same box disabled. Then we performed lock and 
lock-in on the disabled service unit:
amf-adm lock safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
amf-adm lock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp

Then we tried the following commands:
amf-adm repaired safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
amf-adm unlock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp

For either repaired or unlock-in, we got the following error:
error - command timed out (alarm)

SU state stayed as:
safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
         saAmfSUAdminState=LOCKED-INSTANTIATION(3)
         saAmfSUOperState=ENABLED(1)
         saAmfSUPresenceState=TERMINATING(4)
         saAmfSUReadinessState=OUT-OF-SERVICE(1)

Eventually we had to stop the node and restart the node to bring things back to 
normal.

Why disabled service unit stuck at TERMINATING state?  What made a service unit 
stuck at TERMINATING state?
If a node is lost for a little while, what are the effects of the node lost 
contact in the cluster?
How to repair the damage caused by the node lost?

Thanks!

Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com
Proven Partner to Communications Service Providers




________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to