[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

Ulrich Windl Thu, 17 Feb 2022 06:25:48 -0800

>>> Klaus Wenninger <[email protected]> schrieb am 17.02.2022 um 13:34 in
Nachricht
<CALrDAo3C9mOG1F2Pk2xtcwk_b399f9XgY=wh8+oqdc4c5le...@mail.gmail.com>:
...
> But feedback is welcome so that we can do a little tweaking that makes them
> fit
> for a larger audience.
> Remember a case where devices stalled for 50s during a firmware-update
> shouldn't trigger fencing - definitely a case that can't be covered by
> defaults.


It all depends: Say your service cannot stand a delay of 50s caused by some 
"disturbance", then will a fencing (let's say after 40 seconds) bring the 
service up again before the 50 seconds did elapse? Or even worse: When the 
fenced node is needed for service, will it have completed reboot and running 
the cluster services before the 50 seconds elapsed?
I'm afraid for most services the answer will be "no".

However when having waited already 40 seconds, it is possible that things will 
continue after 50 seconds altogether, or is could be that you'll have to wait 
for hours.
Unfortunately no software can predict what will happen in the future, so 
there's a likelihood that fencing now is better than continuing to wait. 
However (for the same reason), there's no guarantee that fencing will improve 
the situation (the shared storage system may still be "pausing", and no node 
can continue).
It's all a bit like hard-mounting NFS exported from a HA-NFS server: You hope 
the server will be operating soon while waiting, even though in some cases it 
would be better to get an I/O error to the application, allowing it to react... 
It all depends...

(on the dmsetup example providing a bad disk)
The example was not intended for sbd; it was for a program of mine that had to 
deal with read/write errors.
For sbd dm-flakey (dm-dust) or dm-delay might be better, but still those are 
very deterministic (unless updated via "dmsetup message")

Regards,
Ulrich


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

Reply via email to