Hi Praveen, Thanks for review, I have commented inline.
* Escalation and Recovery during SC absence period: -Restarts will work as normal, but failover or switchover will result in Node -Failfast. The repair action will be initiated when a SC returns if -saAmfSGAutoRepair is enabled. +Component and su restarts will work as normal. Any fail-over or switch-over at +component, su, and node level will only cleanup faulty components. Recovery will +be delayed until a SC returns: the fail-over or switch-over of SI assignments +will be initiated if saAmfSGAutoRepair is enabled, the node will be reboot if +saAmfNodeAutoRepair, aAmfNodeFailfastOnTerminationFailure, or +saAmfNodeFailfastOnInstantiationFailure is enabled. [Praveen] I think there is no dependecy of failover and switchover of assignents on saAmfSgAutoRepair. Should the sentence be like this? " Recovery (failover or switchvoer of assignments) will be delayed until a SC returns. When first SC comes up after SC absebce state AMF will perform pending repairs: [Minh]: This part is about escalation and recovery which is initiated by su_oper message, it does depend on saAmfSgAutoRepair which is checked in su_try_repair(), so I am not going to change the text +* Possible loss of RTA updates and SI assignment messages +If both SCs go down abruptly (SCs are immediately powered-off for instance), +AMFD could fail to update RTA to IMM, the SI assignment messages sent from +AMFND could not reach to AMFD, recovery could be impossible. + [Praveen] Should be mention the case of loss of assignment reseponse from AMFND to AMFD? Also I think we should mention impact of this loss, something like: "In case of loss of RTA and SI assignments, AMF will not be able to fully recover assignments. Thus application may go in inconsistent state." [Minh]: I rewrites the text as: "If both SCs go down abruptly (SCs are immediately powered-off for instance), AMFD could fail to update RTA to IMM, the SI assignment request message sent from AMFD could not reach to AMFND, or the SI assignment response message sent from AMFND also could not reach to AMFD. In such cases, recovery could be impossible, application may have inappropriate assignment states" One query: It's known in ticket #2210 that loss of mbcsv checkpoint in sc failover in normal cluster can also happen as similar as loss of RTA when both SCs go headless. For the loss of SI assignment messages, although AMFD is using MDS in redundant view but the SI assignment is not synchronization, I wonder if someone abruptly power off active controller when active amfd is about receiving the assignment message, or when amfnd just sends out the assignment response message but does not reach to amfds? On 15/03/17 16:26, praveen malviya wrote: > +saAmfNodeFailfastOnInstantiationFailure is enabled. > [Praveen] I think there is no dependecy of failover and switchover of > assignents on saAmfSgAutoRepair. > Should the sentence be like this? > " Recovery (failover or switchvoer of assignments) will be delayed until a > SC returns. > When first SC comes up after SC absebce state AMF will perform pending > repairs: ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
