On Wed, 2018-07-25 at 23:43 +0800, 李培 wrote: > Dear all > > I have a problem when I use pacemaker. > > the corosync.log in two nodes grows to 1Gb in about one hour. > > the corosync.log only has one kind of message in one node named paas- > controller-22-0-2-12 as below: > Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.11 - not shutting down > Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.11 - not shutting down > Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.11 - not shutting down > Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.11 - not shutting > > > the corosync.log only has one kind of message in another node named > paas-controller-22-0-2-11 as below: > Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.12 - not shutting down > Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.12 - not shutting down > Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.12 - not shutting down > Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error: > cib_process_shutdown_req: > Shutdown ACK from 22.0.2.12 - not shutting > > it seems that the two nodes do not response shutdown request to each > other,so the message keeps being sent out. > > have any of you ever encountered this issue? > > how it happened? how it can be solved? > > I am looking forwarding to hearing from you. > > Thanks in advance. > > Sincerely yours
This is interesting. At least one of the nodes should have an info- level log message like "Shutdown REQ from ..." before these messages start. For this to happen, one of the nodes has to receive a shutdown request from the other, then acknowledge it with a reply, and then the node that sent the request somehow doesn't know it sent a request, and so logs this message. The funny (?) part is that it will reply to the acknowledgement, and then that node will (wrongly) treat that as a reply to one of its own shutdown requests, which it doesn't have, so it logs this message and replies back. Infinite loop :-/ I've opened a bug for the loop: https://bugs.clusterlabs.org/show_bug.cgi?id=5361 However an unanswered question is how the loop got started. One of the nodes thought it received a shutdown request, but the other node didn't think it sent one. That is a mystery here. If you can find the "Shutdown REQ" message, the logs from both nodes around that time might shed some light. -- Ken Gaillot <[email protected]> _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
