On 04/07/2016 03:22 PM, Ferenc Wágner wrote: > Hi, > > On a freshly rebooted cluster node (after crm_mon reports it as > 'online'), I get the following: > > wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup > Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar > Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar > Cleaning up vm-cedar on vhbl05, removing fail-count-vm-cedar > Cleaning up vm-cedar on vhbl06, removing fail-count-vm-cedar > Cleaning up vm-cedar on vhbl07, removing fail-count-vm-cedar > Cleaning up vm-cedar on vhbl08, removing fail-count-vm-cedar > Waiting for 6 replies from the CRMd..No messages received in 60 seconds.. > aborting > > Meanwhile, this is written into syslog (I can also provide info level > logs if necessary): > > 22:03:02 vhbl08 crmd[8990]: error: Cannot route message to unknown node > vhbl03 > 22:03:02 vhbl08 crmd[8990]: error: Cannot route message to unknown node > vhbl04 > 22:03:02 vhbl08 crmd[8990]: error: Cannot route message to unknown node > vhbl06 > 22:03:02 vhbl08 crmd[8990]: error: Cannot route message to unknown node > vhbl07
This message can only occur when the node name is not present in this node's peer cache. I'm guessing that since you don't have node names in corosync, the cache entries only have node IDs at this point. I don't know offhand when pacemaker would figure out the association, but I bet it would be possible to ensure it by running some command beforehand, maybe crm_node -l? > 22:03:04 vhbl08 crmd[8990]: notice: Operation vm-cedar_monitor_0: not > running (node=vhbl08, call=626, rc=7, cib-update=169, confirmed=true) > > For background: > > wferi@vhbl08:~$ sudo cibadmin --scope=nodes -Q > <nodes> > <node id="167773707" uname="vhbl05"> > <utilization id="nodes-167773707-utilization"> > <nvpair id="nodes-167773707-utilization-memoryMiB" name="memoryMiB" > value="124928"/> > </utilization> > <instance_attributes id="nodes-167773707"/> > </node> > <node id="167773708" uname="vhbl06"> > <utilization id="nodes-167773708-utilization"> > <nvpair id="nodes-167773708-utilization-memoryMiB" name="memoryMiB" > value="124928"/> > </utilization> > <instance_attributes id="nodes-167773708"/> > </node> > <node id="167773706" uname="vhbl04"> > <utilization id="nodes-167773706-utilization"> > <nvpair id="nodes-167773706-utilization-memoryMiB" name="memoryMiB" > value="124928"/> > </utilization> > <instance_attributes id="nodes-167773706"/> > </node> > <node id="167773705" uname="vhbl03"> > <utilization id="nodes-167773705-utilization"> > <nvpair id="nodes-167773705-utilization-memoryMiB" name="memoryMiB" > value="124928"/> > </utilization> > <instance_attributes id="nodes-167773705"/> > </node> > <node id="167773709" uname="vhbl07"> > <utilization id="nodes-167773709-utilization"> > <nvpair id="nodes-167773709-utilization-memoryMiB" name="memoryMiB" > value="124928"/> > </utilization> > <instance_attributes id="nodes-167773709"/> > </node> > <node id="167773710" uname="vhbl08"> > <utilization id="nodes-167773710-utilization"> > <nvpair id="nodes-167773710-utilization-memoryMiB" name="memoryMiB" > value="124928"/> > </utilization> > </node> > </nodes> > > Why does this happen? I've got no node names in corosync.conf, but > Pacemaker defaults to uname -n all right. > _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
