On Thu, 2018-01-04 at 16:47 -0800, Suresh Rajagopalan wrote: > Hello > > I am running a corosync cluster on pair of centos 7.4 machines. We > frequently see these messages under moderate load, after which > pacemaker restarts the services. I see a older bug similar to this > however that was in 2014 and we are on 7.4 now. Any help is > appreciated. > > Regards > Suresh > > > pcmk_dbus_find_error > > lrmd: info: pcmk_dbus_find_error: GetUnit error > 'org.freedesktop.DBus.Error.NoReply': Did not receive a reply. > Possible causes include: the remote application did not send a reply, > the message bus security policy blocked the reply, the reply timeout > expired, or the network connection was broken."
The first thing I'd try is raising the timeout on your systemd resources' start/stop/monitor operations. That should propagate to the DBus calls. There's not much else to do but investigate the system logs and track various load levels to try to figure out where the bottleneck is, and correct that. -- Ken Gaillot <[email protected]> _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
