Andrew, All, Please have a look at the patches I queued up here: https://github.com/lge/pacemaker/commits/for-beekhof
Most (not all) are specific for the heartbeat cluster stack. Thanks, Lars A few comments here: ----- This effectively changes crm_mon output, but also changes logging where this method is invoked: Low: native_print: report target-role as well This is for the "Why does my resource not start?" guys who forgot to remove the limiting target-role setting. Report target role (unless "Started", which is the default anyways), if it limits our abilities (Slave, Stopped), or if it differs from the current status. ----- Heartbeat specific: Low: allow heartbeat to spawn the pengine itself, and tell crmd about it Heartbeat 3.0.6 now may spawn the pengine directly, and will announce this in the environment -- I introduced the setting "crmd_spawns_pengine". This improves shutdown behavior. Otherwise I regularly find an orphaned pengine process after pacemaker shutdown. ----- Heartbeat specific, as consequence of the fix blow: Low: add debugging aid to help spot missing set_msg_callback()s on heartbeat In ha_msg_dispatch(), change from rcvmsg() to readmsg(). rcvmsg() is internally simply a wrapper around readmsg(), which silently deletes messages without matching callback. Use readmsg() directly here. It will only return unprocessed (by callbacks) messages, so log a warning, notice or debug message depending on message header information, and ha_msg_del() it ourselves. ----- Heartbeat specific bug fix: High: fix stonith ignoring its own messages on heartbeat Since the introduction of the additional F_TYPE messages T_STONITH_NOTIFY and T_STONITH_TIMEOUT_VALUE, and their use as message types in global heartbeat cluster messages, stonith-ng was broken on the heartbeat cluster stack. When delegation was made the default, and the result could only be reaped by listening for the T_STONITH_NOTIFY message, no-one (but stonithd itself) would ever notice successful completion, and stonith would be re-issued forever. Registering callbacks for these F_TYPE fixes these hung stonith and stonith_admin operations on the heartbeat cluster stack. ----- Heartbeat specific: Medium: fix tracking of peer client process status on heartbeat Don't optimistically assume that peer client processes are alive, or that a node that can talk to us is in fact member of the same ccm partition. Whenever ccm tells us about a new membership, *ask* for peer client process status. ----- This oneliner may well be relevant for corosync CPG as well, possibly one of the reasons the pcmk_cpg_membership() has this funny "appears to be online even though we think it is dead" block? fix crm_update_peer_proc to NOT ignore flags if partially set The "set_bit()" function used here actually deals with masks, not bit numbers. The "flag" argument should in fact be plural: flags. These proc flag bits are not always set one at a time, but for example as "crm_proc_crmd | crm_proc_cpg", and not necessarily cleared with the same combination. Ignoring to-be-set flags just because *some* of the flag bits are already set is clearly a bug, and may be the reason for stale process cache information. ----- Heartbeat specific: Medium: map heartbeat JOIN/LEAVE status to ONLINE/OFFLINE The rest of the code deals in "online" and "offline", not "join" and "leave". Need to map these states, or the rest of the code won't work properly. ----- Generic, if shutdown is requested before stonith connection was ever established (due to other problems), inisting to re-try the stonith connection confused the shutdown. Medium: don't trigger a stonith_reconnect if no longer required Get rid of some spurious error messages, and speed up shutdown, even if the connection to the stonith daemon failed. ----- Non-functional change, just for readability: Low: use CRM_NODE_MEMBER, not CRM_NODE_ACTIVE ACTIVE is defined to be MEMBER anyways: include/crm/cluster.h:#define CRM_NODE_ACTIVE CRM_NODE_MEMBER Don't confuse the reader of the code by implying it was something different. ----- Heartbeat specific, packaging only: Low: heartbeat 3.0.6 knows to finds the daemons; drop compat symlinks _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org