Dear Andrei, I'm sorry for the screenshot, this is the only thing that I have left after the crash.
What would the best course of action be in this situation? We don't have a STONITH device. But the local network is still up (both nodes see each othes). Also, what does "(blocked)" means? Sincerely, Ark. [email protected] On Sun, May 5, 2019 at 9:46 PM Andrei Borzenkov <[email protected]> wrote: > 05.05.2019 16:14, Arkadiy Kulev пишет: > > Hello! > > > > I run pacemaker on 2 active/active hosts which balance the load of 2 > public > > IP addresses. > > A few days ago we ran a very CPU/network intensive process on one of the > 2 > > hosts and Pacemaker failed. > > > > I've attached a screenshot of the terminal to this email. > > > > The "Failed Actions" shows that the IPaddr2 "monitor_30000" failed with > > "unknown error" and a status of "Timed Out" (queue=0ms exec=0ms). The > > /etc/init.d LSB script (mycluster) failed as well (and set to blocked). > > > > This completely stalled Pacemaker and the second host didn't take over > the > > IP address and gateway settings. > > > > Any ideas would be appreciated. > > > > Stop operation failed, you have no stonith, so pacemaker cannot continue > and is stuck. > > > > > > [image: Screen Shot 2019-04-30 at 12.36.34.png] > > > > > Images are hard to reply to, consume excessive space and cannot be > viewed using text only clients. There is no reason to send image when > you can just copy and paste several lines of text. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
