This ticket is classified as a major defect.
This implies that the problem is either common or the effects are serious.
I dont believe they are (common or serious).
Trying to fix this by a solution that ignorers the response from the PBE
converts the problem
to be a critically serious (dangerous) problem. The frequency of the problem
will also likely go up since
applications will adapt to this new "use case".
One major question I have is: What is the application that causes this issue to
happen ?
The OpenSAF IMM with PBE has worked this way now for many years and we have not
seen this
issue until now. Why is that ?
The problem with ignoring the reply from the PBE and just "asssuming" the
commit (or abort)
status for the PBe is that you have ea 50% chance of the action (commit or
abort) in the RAM
diverging from the PBE. The reply is sent to the application. Next time you
re-enable the PBE
you have a 50% chance of the PBE (sqqlite database) being inconsistent with the
RAM.
This divergence may not be discovered immediately. You could execute the
cluster for days/weeks
untill sudently the inconsistency gets exposed.
Believe me you dont want to go there.
Again, is this really a common problem now ?
Why is that ?
Again I give you the two solutions that are not just better, but are actually
*solutions* as opposed
to dangerous hacks. If and when this issue occurs.
1) Re-enble the PBE.
2) cluster restart.
3) Or even better. Take true responsibility in the upgrade campaign for the
system/deployment
you are upgrading into by quiescing that part which is overloading the PBE
and/or the filesystem.
The problem is extremely rare, unless you are provoking this by some
application that radically departs from supported CCB behavior, or if the file
system the PBE is mounted on is overloaded,
typically by being shared with some file heavy application.
We are deliberating whether to compromise consistency and availability here for
the sake
of allowing applications/usage that are not supported.
The whole purpose of SAF is to cater to high availability, with maintained
consistency.
If you loose consistency you will loose availability. The two are not tradeable.
The availability that is the target of SAF is the payload application
availability. So dont mistake
the goal of HA to be that IMM config data changes have to be highly available.
Config data changes should never be the solution to availability emergencies.
And if they end up being so, you better hope that the config data is consistent
and that
config data changes are not blocked by NON config data changes.
The root of this non critical and rare problem is that NON config data changes
are interfering
with CCB throughput.
---
** [tickets:#2229] imm:disable pbe should honor critical ccbs**
**Status:** review
**Milestone:** 5.2.FC
**Created:** Wed Dec 14, 2016 09:29 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Dec 15, 2016 04:48 AM UTC
**Owner:** Neelakanta Reddy
reproducible steps:
1. Bring up the cluster with PBE configured.
2. enable PBE
3. parallely run multiple ccb operations
4. disable PBE
5. in one of the payload/controller restart the immnd/node
6. sync wil be aboreted with following messages
WA PBE has been disabled with ccbs in critical state - To resolve: Enable PBE
or resart/reload the cluster
NO Still waiting for existing Ccbs to terminate after 20.027520 seconds.
Aborting this sync attempt
7. The IMMND will never get synced untill cluster restart
The problem is observed, when the node is not joining in middleware upgrade,
and evetually upgrade fails.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets