I think the patch you have sent is good (will review) but I think there is a 
race that can only be addressed by changing amfnd.

If the LOCK-IN admin command is received when an SU is UNINSTANTIATED but the 
repair order has been sent from amfd to amfnd, we have the same problem again! 
The SU terminate order is received by amfnd but now the SU is in INSTANTIATING 
escalating to cleanup of the component. If the cleanup script in this case 
succeeds without killing a previously started process e.g. because there is no 
PID file, There will be a detached component process started trying to register.

So if you are worried about the return code BAD-OPERATION (which I am not) then 
we could use ERR_LIBRARY which un-arguedly would(should) cause this program to 
exit.



---

** [tickets:#807] AMF returns TRYAGAIN for saAmfRegister**

**Status:** review
**Milestone:** future
**Created:** Thu Mar 06, 2014 03:52 PM UTC by Hans Feldt
**Last Updated:** Thu Apr 03, 2014 06:42 AM UTC
**Owner:** Nagendra Kumar

Use case: node lock followed by node lock instantiation, During node lock a 
component causes an SUfailover followed by repair of the component which means 
instantiation.

So we have a component in state INSTANTIATING when the SU terminate request 
comes. AMF then (silently) escalates this to execute cleanup of the already 
instantiating component and changes its state to TERMINATING. Due to timing the 
cleanup script does not find any process to kill and returns 0. At the same 
time the instantiate script starts a process that calls saAmfRegister which 
returns TRYAGAIN because the component is in TERMINATING state.

Suggestions:
- AMF should probably return BAD-OPERATION in this case (is there any valid 
case where it should return TRYAGAIN?)
- the reason for escalating to cleanup should be logged

Finally, should AMF really start a second CLC CLI script while it knows one is 
already running? This implies that the cleanup script must be able to kill the 
instantiate script which is not stated in the specification besides I haven't 
any such script. So maybe AMF should kill the child process executing the 
instantiate script before it starts cleanup. This is a change in "core" since 
there is no interface to do this.




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to