Hi Gary,
Ack from me.
Thanks,
Nagendra, 91-9866424860
High Availability Solutions Pvt. Ltd. (www.hasolutions.in)
- OpenSAF Support and Services
--------- Original Message --------- Subject: Re: [PATCH 1/1] amfd: reboot
nodes that report conflicting 2N active assignments [#2920]
From: "Gary Lee" <[email protected]>
Date: 9/3/18 6:41 pm
To: [email protected]
Cc: "Hans Nordeback" <[email protected]>, [email protected],
[email protected]
Hi
I think the most important point is we cannot trust any state returned from the
payloads. Trying to reconcile what happened during the split seems futile.
We are better off rebooting the node so we have a known starting point and
reallocate assignments accordingly.
During the split, the PLs likely didn't have concurrent access to a shared
resource. Now that the network is merged, we could have lots of issues if both
PLs are modifying this resource assuming it has exclusivity.
Gary
On 3 Sep 2018, at 11:00 pm, <[email protected]> <[email protected]>
wrote:
Hi Hans/Gary,
Thanks for your opinion.
I will presume that until the applications are declared healthy by Amf, they
are good to go.
I am just trying to find an alternate path like remove all assignments and
terminate the applications of that SG and then unlock-in and unlock, to avoid
impact on other applications because of reboot.
In this case, we will be removing the assignments and not giving new
assignments.
If they go faulty, we can reboot if it goes to inst/term failure if
saAmfNodeFailfastOnTerminationFailure and
saAmfNodeFailfastOnInstantiationFailure are set anyway.
I failed to understand application use case after cluster merge. We need to do
fast deactivation of SUs, but when Cluster was separated, then both the
applications were Active at the same time anyway for some time. Do you have
any, please share.
Thanks,
Nagendra, 91-9866424860
High Availability Solutions Pvt. Ltd. (www.hasolutions.in)
- OpenSAF Support and Services
--------- Original Message --------- Subject: Re: [PATCH 1/1] amfd: reboot
nodes that report conflicting 2N active assignments [#2920]
From: "Hans Nordeback" <[email protected]>
Date: 9/3/18 4:27 pm
To: [email protected], "Gary Lee" <[email protected]>,
[email protected]
Cc: [email protected]
Hi,
I think AMF should avoid getting into this state. Resolving this state may be
difficult.
AMF should not make any new assignments/failovers when the state of the
failing node/component is not known,
i.e. we should prefer consistency before availability.
/Thanks HansN
On 09/03/2018 12:33 PM, [email protected] wrote:
Hi Gary,
Thanks for your response.
Susi delete will be little slower in resolving the conflicts, but advantage it
has over reboot is, it doesn't impact other applications. The other advantage
of susi delete is that the availability of SUs for workload assignments will be
lesser in reboot than Susi delete as reboot will take its own time to come back
and instantiate SUs. Also, I think susi delete of one SU will do.
Going forward, we can intimate the applications that its assignments are being
removed because of re-merge after split(either by CSI or by
OsafCsiAttributeChangeCallbackT), it would help them taking their own actions
like syncing of DB, etc.
My take would be that we shouldn't use reboot in any case by Amf, we need to
recover from our situations by our self. As a HA software, we need to adopt
self healing approach.
What other co-maintainers say?
Thanks,
Nagendra, 91-9866424860
High Availability Solutions Pvt. Ltd. (www.hasolutions.in)
- OpenSAF Support and Services
--------- Original Message ---------
Subject: Re: [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active
assignments [#2920]
From: "Gary Lee" <[email protected]>
Date: 9/3/18 1:36 pm
To: [email protected], [email protected],
[email protected]
Cc: [email protected]
Hi Nagendra
On 03/09/18 17:50, [email protected] wrote:
> Hi Gary,
> I have few questions:
> 1. Do we really want to reboot both the nodes in case of conflicts?
That's a good question. A cluster reboot should also be considered? I
have proposed both nodes as it's somewhere in between. Keep in mind
other SG types could be affected also, but not picked up.
> 2. Even we want to send reboot to one node, which node we should send
> the reboot, the one, which was a part of smaller cluster?
I think we should keep it simple for this ticket, as it's really just a
stop gap. Something like #2918 should be considered.
> 3. If we could differentiate here that the conflicts happened because
> of re-merge, then will susi_delete message(here also, we need to
> decide which SU susi need to be deleted) will do rather than reboot?
> Rebooting will be little to harsh for other applications running on
> the nodes, it is just my understanding.
> 4. In general, what we assume if the partition is merged, applications
> for sure will be out of sync , so just deleting the susi will do or we
> need to reboot for sure. This is just for my understanding as I am not
> much aware of actual application level impact(in terms of Data base,
> its behavior, etc.).
I think we want to resolve the conflicting state as soon as possible.
Would deleting the susi be potentially slower than issuing a reboot?
Thanks
Gary
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel