>>> "Neitzert, Greg A" <[email protected]> schrieb am 05.11.2021 um 06:28 in Nachricht <dm8pr07mb8854271eec0543389891dc1088...@dm8pr07mb8854.namprd07.prod.outlook.com>
> Hello, > > With a Pacemaker 1.1.13/Corosync 2.3.5 cluster is it possible to define a > relationship between two resources so that: > > 1. B depends on A (a normal order constraint) > > AND > > 2. If either fails, they both need to be stopped and restarted, in the > order defined above (B stops, A stops, A starts, then B starts) That would mean A depends on B, and B depends on A, probably hard to start ;-) Reading the scenario below, I must admit that I don't fully understand the problem: If SAN "seeds in" an error to LVM, and then FS (and eventually to the application), what will be the net effect? Filesystem hanging? FS seeing write errors? etc.? How would a recovery be done? Fix the SAN? Just wait? Could it be that your multipath configuration is simply done wrong? regards, Ulrich > > > > In the normal configuration, if A fails, then A and B will be restarted, > because B depends on A. However, if B fails, only B is restarted because A > does not depend on it. In most cases this is going to be fine, but we have a > case where in some situations B is failing precisely because A above it is > having a failure (but we don't know it yet). > > > > The order attribute takes care of the ordering of the start/stop (along with > adding colocation so they stay on the same node). > > > > The problem I am trying to address is the case where the monitor for B fires > first, and B is attempted to be restarted, but it won't work until A is. > > > > Case in point, LVM and Filesystem2 resources. > > > > If LVM needs to be refreshed, the Filesystem above it stops working (e.g. > I/O fails). However, Filesystem noticed a problem first, and LVM didn't have > a chance to see it also had a problem. Therefore, Filesystem will try to > restart itself until it exhausts its retries. At that point, a cleanup is > required to get things going again, and LVM has to be manually restarted. > > > > We have a case where the LVM cache needs to be refreshed and the volumes > reactivated to clear up a problem caused by paths going down and coming back > up in a SAN causing the LVM VG to get in a compromised state, and the LVM > problem causes the Filesystem I/O to fail, and Filesystem notices first, > monitor fails, it stops itself, and tries in vain to restart, because it will > not until the LVM resource is restarted. > > > > I made the monitor interval longer for Filesystem than LVM which makes LVM > find the problem first, but that isn't foolproof. > > If it was a rule that if a Filesystem resource needs to be stopped and > started that the LVM resource it depends on has to be restarted first, I > should be able to avoid the problem entirely. > > > > In essence, what I'm asking is if I can make two resource start and stop in > a particular order, but also define that if one has to be started or stopped > the other must as well (in my defined order). > > > > Thanks. > > > > Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B ‑ > Middleware > > Unisys Corp _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
