[Pacemaker] new node causes spurious evil

Matthew O'Connor Fri, 11 May 2012 21:54:16 -0700

My question: Why will a node that is not allowed to start a resourceattempt to start a monitor on that resource? Is there a way to changethis behavior? (Specific monitors in question:ocf:heartbeat:iSCSITarget and ocf:heartbeat:iSCSILogicalUnit)


The details:

I have two nodes, ds01 and ds02, running and happy, and when adding athird node called gw05, things start falling apart. I've configured anasymmetric opt-in cluster per the documentation, and have explicit rulesabout what can start where. ds01 and ds02 are configured with a varietyof resources. gw05 is not configured with any - it's effectively ablank node.

With ds01 and ds02 running and in a stable state with their resources,bringing gw05 online (even in standby-mode) causes many things to fallapart. First, a monitor error on gw05 for a resource that wasn'tsupposed to even run there. The monitor error belonged to a group thatwas alive and well on ds01; the group died, but one of the group memberswas left alive on ds01 (?!). Nothing could be migrated to ds02, or awayfrom gw05. After pulling a "service pacemaker stop" on the command lineand doing a resource cleanup on the group from one of the remaining ds??nodes, everything went back to normal.

(I've simplified the details here - the actual configuration is slightlymore complex with two resource groups instead of one. Both groups die,one group completely and the other has the dangling ip-address resourceon the node it started on. gw05 never starts anything, and isn'tsupposed to, but it's the one reporting the errors and evidently killingthe resources.)

Now, I've tried location statements to explicitly exclude gw05 fromstarting any of the resources it's complaining about, and used copiousorder and colocation statements, to no avail. The kicker is: when Ifinally gave in and installed one "missing" package (that should nothave been required on gw05), the monitor worked again and things stoppedfailing.

More Specifics: packages iscsitarget and iscsitarget-dkms were requiredfor gw05 to stop killing my resources. I have an ocf:iSCSITarget,iSCSILogicalUnit, and virtual ip address in each of two groups. ds01and ds02 share the load for these groups, and are the ONLY nodes allowedto run them. gw05 should not even be trying to start these, let aloneANY resources/monitors in those groups IMO. Using -inf locationstatements for both the group and for the group members had no effect.This effectively suggests to me that any new node I bring into thecluster will need to have these extra packages installed.

If this is a RTFM question, I apologize. I've been reading it,honestly, and this behavior totally bewilders me. Would settingis-managed="false" in the resource defaults help? I almost loathe toadd another step to the current "turn this resource on here" chain.


Thanks!
-- Matt


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] new node causes spurious evil

Reply via email to