Hi! I'm trying to set up a 2-node cluster. I'm new to pacemaker, but things are getting better and better. However, I am completely at a loss here.
I have a cloned tomcat resource, which runs on both nodes and doesn't really depend on anything (it doesn't use DRBD or anything else of that sort). But I'm trying to get pacemaker move the cluster IP to another node in case tomcat fails. Here's the relevant parts of my config: node srvplan1 node srvplan2 primitive DBIP ocf:heartbeat:IPaddr2 \ params ip="1.2.3.4" cidr_netmask="24" \ op monitor interval="10s" primitive drbd_pgdrive ocf:linbit:drbd \ params drbd_resource="pgdrive" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" primitive pgdrive_fs ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/hd2" fstype="ext4" primitive ping ocf:pacemaker:ping \ params host_list="193.233.59.2" multiplier="1000" \ op monitor interval="10" primitive postgresql ocf:heartbeat:pgsql \ params pgdata="/hd2/pgsql" \ op monitor interval="30" timeout="30" depth="0" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ meta target-role="Started" primitive tomcat ocf:heartbeat:tomcat \ params java_home="/usr/lib/jvm/jre" catalina_home="/usr/share/tomcat" tomcat_user="tomcat" script_log="/home/tmo/log/tomcat.log" statusurl="http://127.0.0.1:8080/status/" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="120" \ op monitor interval="30" timeout="30" group postgres pgdrive_fs DBIP postgresql ms ms_drbd_pgdrive drbd_pgdrive \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone pings ping \ meta interleave="true" clone tomcats tomcat \ meta interleave="true" target-role="Started" location DBIPcheck DBIP \ rule $id="DBIPcheck-rule" 10000: defined pingd and pingd gt 0 location master-prefer-node1 DBIP 50: srvplan1 colocation DBIP-on-web 1000: DBIP tomcats colocation postgres_on_drbd inf: postgres ms_drbd_pgdrive:Master order postgres_after_drbd inf: ms_drbd_pgdrive:promote postgres:start As you can see, there are three explicit constraints for the DBIP resource: preferred node (srvplan1, score 50), successful ping (score 10000) and running tomcat (score 1000). There's also the resource stickiness set to 100. Implicit constraints include collocation of the postgres group with the DRBD master instance. The ping check works fine: if I unplug the external LAN cable or use iptables to block pings, everything gets moved to another node. Check for tomcat isn't working for some reason, though: [root@srvplan1 bin]# crm_mon -1 ============ Last updated: Fri Jun 22 10:06:59 2012 Last change: Fri Jun 22 09:43:16 2012 via cibadmin on srvplan1 Stack: openais Current DC: srvplan1 - partition with quorum Version: 1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 17 Resources configured. ============ Online: [ srvplan1 srvplan2 ] Master/Slave Set: ms_drbd_pgdrive [drbd_pgdrive] Masters: [ srvplan1 ] Slaves: [ srvplan2 ] Resource Group: postgres pgdrive_fs (ocf::heartbeat:Filesystem): Started srvplan1 DBIP (ocf::heartbeat:IPaddr2): Started srvplan1 postgresql (ocf::heartbeat:pgsql): Started srvplan1 Clone Set: pings [ping] Started: [ srvplan1 srvplan2 ] Clone Set: tomcats [tomcat] Started: [ srvplan2 ] Stopped: [ tomcat:0 ] Failed actions: tomcat:0_start_0 (node=srvplan1, call=37, rc=-2, status=Timed Out): unknown exec error As you can see, tomcat is stopped on srvplan1 (I have deliberately messed up the startup scripts), but everything else still runs there. ptest -L -s shows: clone_color: ms_drbd_pgdrive allocation score on srvplan1: 10350 clone_color: ms_drbd_pgdrive allocation score on srvplan2: 10000 clone_color: drbd_pgdrive:0 allocation score on srvplan1: 10100 clone_color: drbd_pgdrive:0 allocation score on srvplan2: 0 clone_color: drbd_pgdrive:1 allocation score on srvplan1: 0 clone_color: drbd_pgdrive:1 allocation score on srvplan2: 10100 native_color: drbd_pgdrive:0 allocation score on srvplan1: 10100 native_color: drbd_pgdrive:0 allocation score on srvplan2: 0 native_color: drbd_pgdrive:1 allocation score on srvplan1: -INFINITY native_color: drbd_pgdrive:1 allocation score on srvplan2: 10100 drbd_pgdrive:0 promotion score on srvplan1: 30700 drbd_pgdrive:1 promotion score on srvplan2: 30000 group_color: postgres allocation score on srvplan1: 0 group_color: postgres allocation score on srvplan2: 0 group_color: pgdrive_fs allocation score on srvplan1: 100 group_color: pgdrive_fs allocation score on srvplan2: 0 group_color: DBIP allocation score on srvplan1: 10150 group_color: DBIP allocation score on srvplan2: 10000 group_color: postgresql allocation score on srvplan1: 100 group_color: postgresql allocation score on srvplan2: 0 native_color: pgdrive_fs allocation score on srvplan1: 20450 native_color: pgdrive_fs allocation score on srvplan2: -INFINITY clone_color: tomcats allocation score on srvplan1: -INFINITY clone_color: tomcats allocation score on srvplan2: 0 clone_color: tomcat:0 allocation score on srvplan1: -INFINITY clone_color: tomcat:0 allocation score on srvplan2: 0 clone_color: tomcat:1 allocation score on srvplan1: -INFINITY clone_color: tomcat:1 allocation score on srvplan2: 100 native_color: tomcat:1 allocation score on srvplan1: -INFINITY native_color: tomcat:1 allocation score on srvplan2: 100 native_color: tomcat:0 allocation score on srvplan1: -INFINITY native_color: tomcat:0 allocation score on srvplan2: -INFINITY native_color: DBIP allocation score on srvplan1: 9250 native_color: DBIP allocation score on srvplan2: -INFINITY native_color: postgresql allocation score on srvplan1: 100 native_color: postgresql allocation score on srvplan2: -INFINITY clone_color: pings allocation score on srvplan1: 0 clone_color: pings allocation score on srvplan2: 0 clone_color: ping:0 allocation score on srvplan1: 100 clone_color: ping:0 allocation score on srvplan2: 0 clone_color: ping:1 allocation score on srvplan1: 0 clone_color: ping:1 allocation score on srvplan2: 100 native_color: ping:0 allocation score on srvplan1: 100 native_color: ping:0 allocation score on srvplan2: 0 native_color: ping:1 allocation score on srvplan1: -INFINITY native_color: ping:1 allocation score on srvplan2: 100 Why the score for the DBIP is -INFINITY on the srvplan2? The only INF rule in my config is the collocation rule for the postgres group. I can surmise that DBIP can't be run on srvplan2 because the DRBD isn't Master there, but there's nothing preventing it from being promoted, and this rule doesn't stop the DBIP from being moved in case of ping failure either. So there must be something else. I also don't quite understand why the DBIP score is 9250 on srvplan1. It should be at least 10000 for the ping, and 250 more for preference and stickiness. If I migrate the DBIP to srvplan2 manually, the score is 10200 there, which makes me think that 1000 gets subtracted because tomcat is stopped on srvplan1. But why? This is a positive rule, not a negative one. It should just add 1000 if tomcat is running, but shouldn't subtract anything if it isn't, am I wrong? Does this have anything to do with the fact I'm trying to collocate the IP with a clone? Or am I looking in the wrong direction? I tried removing DBIP from the group, and it got moved to another node. Obviously, everything else was left on the first one. Then I tried adding a collocation of DBIP with postgres resources (and the other way around), and if the score of that rules is high enough, the IP gets moved back, but I never was able to get postgres moved on the second node (where the IP is) instead. -- Sergey A. Tachenov <stache...@gmail.com> _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org