Re: [devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

Gary Lee Mon, 03 Sep 2018 03:52:37 -0700

Hi Nagendra

I think we must minimise the time the 2N SUs are active concurrently.IMO it's better for the nodes to be unavailable for a brief amount oftime with a reboot, than having data inconsistency. The longer than 2NSUs are assigned actively concurrently, the higher the risk. We alreadyknow one of the nodes must have been split from the main networkpartition, there is a chance other SGs on the node are affected, eg. toomany NwayActive assignments, or other duplicate 2N assignments.


Gary


On 03/09/18 20:33, [email protected] wrote:

Hi Gary,
Thanks for your response.

Susi delete will be little slower in resolving the conflicts, butadvantage it has over reboot is, it doesn't impact other applications.The other advantage of susi delete is that the availability of SUs forworkload assignments will be lesser in reboot than Susi delete asreboot will take its own time to come back and instantiate SUs. Also,I think susi delete of one SU will do.Going forward, we can intimate the applications that its assignmentsare being removed because of re-merge after split(either by CSI or byOsafCsiAttributeChangeCallbackT), it would help them taking their ownactions like syncing of DB, etc.My take would be that we shouldn't use reboot in any case by Amf, weneed to recover from our situations by our self. As a HA software, weneed to adopt self healing approach.

What other co-maintainers say?
Thanks,
Nagendra, 91-9866424860
High Availability Solutions Pvt. Ltd. (www.hasolutions.in)
- OpenSAF Support and Services
 --------- Original Message ---------

    Subject: Re: [PATCH 1/1] amfd: reboot nodes that report
    conflicting 2N active assignments [#2920]
    From: "Gary Lee" <[email protected]>
    Date: 9/3/18 1:36 pm
    To: [email protected], [email protected],
    [email protected]
    Cc: [email protected]

    Hi Nagendra

    On 03/09/18 17:50, [email protected] wrote:
    > Hi Gary,
    > I have few questions:
    > 1. Do we really want to reboot both the nodes in case of conflicts?

    That's a good question. A cluster reboot should also be considered? I
    have proposed both nodes as it's somewhere in between. Keep in mind
    other SG types could be affected also, but not picked up.

    > 2. Even we want to send reboot to one node, which node we should
    send
    > the reboot, the one, which was a part of smaller cluster?

    I think we should keep it simple for this ticket, as it's really
    just a
    stop gap. Something like #2918 should be considered.

    > 3. If we could differentiate here that the conflicts happened
    because
    > of re-merge, then will susi_delete message(here also, we need to
    > decide which SU susi need to be deleted) will do rather than
    reboot?
    > Rebooting will be little to harsh for other applications running on
    > the nodes, it is just my understanding.

    > 4. In general, what we assume if the partition is merged,
    applications
    > for sure will be out of sync , so just deleting the susi will do
    or we
    > need to reboot for sure. This is just for my understanding as I
    am not
    > much aware of actual application level impact(in terms of Data
    base,
    > its behavior, etc.).

    I think we want to resolve the conflicting state as soon as possible.
    Would deleting the susi be potentially slower than issuing a reboot?

    Thanks
    Gary


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

Reply via email to