I think #1 #2 #3 #4 should be included in the scope of #1725, it helps to have
a global view on solution for all cases. The actual implementation could not
fully complete in this 5.1 but at least AMF should be agreed on the way it
recovers for all cases after headless. Doing #4 after all may change the
solution of #1 #2 #3. I think the gap between #3 and #4 should not be big, the
difference is the faults other than node restart during headless.
I agree #5 should be done at last since only 2N offcially supports SI Dep, and
it should not impact on the overall solution
I would like to change the above orders of implementation:
@0. We are here now: No admin op continuation, no recovery on faults during
headless.
Since componentRestart/suRestart has no impact on recovery after headless,
faults during headless here mean: failover escalation, node reboot/powered-off
by user during headless. Faults are different phenomenons but they all result
in loss of SUSI. Having #1902 will remove the major impact of a node reboot due
to immediate escalation and AMF also has to deal with the loss of SUSI the same
as without #1902 plus failover escalation
@1. Admin op continuation without required recovery on faults during headless
@1.a) All CSI(s) callback completes during headless, but SUSI states are
still QUIESCED/QUIESCING
@1.b) One of CSI(s) callback is still ongoing after headless (AMFD would
have to wait for it?)
@2. Recovery on faults. (Doing fault recovery needs to consider admin op
continuation which would have been implemented in step @1)
Need #1902
@2.a.) Faults in normal flow: No admin op continuation is required after
headless, but fault did happen during headless
@2.b.) Faults happen during admin operation while headless, after headless
AMFD needs to consider a recovery on fault together with admin op continuation.
@3. @1 + @2 + With SI Dep.
---
** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**
**Status:** accepted
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Wed Jul 27, 2016 10:18 AM UTC
**Owner:** Minh Hon Chau
This ticket is more likely an enhancement that targets on how AMFD detect and
recover the transients SUSI left over from headless. There are three major
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During
headless, the middle HA assignments of whole admin op sequence between AMFND
and components could be:
(2.1) The assignment completes, component returns OK with csi callback,
then cluster recover
(2.2) The assignment is under going, then cluster recover. The assignment
afterward could complete, or csi callback returns FAILED_OPERATION or error can
also happen
At the time cluster recover, amfd has collected all assignments from all
amfnd(s). These assignments can be in assigned or assigning states whilst its
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen
in a combination, which means while issuing admin op (2), cluster go headless
and any kinds of failover (1) can happen during headless.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets