We have a scenario when nodes lost contact for 10 seconds and rejoined, some
service units ended up in Terminating state.
For example, the following message was seen from /var/log/messages:
NO Lost contact with 'appbox'
We saw some service units on the same box disabled. Then we performed lock and
lock-in on the disabled service unit:
amf-adm lock safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
amf-adm lock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
Then we tried the following commands:
amf-adm repaired safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
amf-adm unlock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
For either repaired or unlock-in, we got the following error:
error - command timed out (alarm)
SU state stayed as:
safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
saAmfSUAdminState=LOCKED-INSTANTIATION(3)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=TERMINATING(4)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
Eventually we had to stop the node and restart the node to bring things back to
normal.
Why disabled service unit stuck at TERMINATING state? What made a service unit
stuck at TERMINATING state?
If a node is lost for a little while, what are the effects of the node lost
contact in the cluster?
How to repair the damage caused by the node lost?
Thanks!
Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com
Proven Partner to Communications Service Providers
________________________________
The information transmitted herein is intended only for the person or entity to
which it is addressed and may contain confidential, proprietary and/or
privileged material. Any review, retransmission, dissemination or other use of,
or taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received this
in error, please contact the sender and delete the material from any computer.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users