Hi, I have two node cluster, running Master/Slave drdb and fs resources (there will be more resources later). Here are the details of the software I'm using: Debian 5.03 stable, DRBD 8.3.7 compiled from source, corosync 1.1.2 and pacemaker1.0.6 installed from madkiss repository.
I have a problem, that I can't solve for two days and a lot of digging in internet. The cluster works right in all cases but one - node1 runs Primary DRBD and it is mounted the fs resource, than I simulate power loss (plug out the power cord of node1) and node2 takes all resources, promotes DRBD and mount the fs (so far so good).Then again I simulate power loss by unplugging the power cord of node2. Then I power on node1, and it boots, loads its stuff and start corosync and then the cluster resource manager promotes DRBD to Primary on node1 (it should not!). That is a disaster, because i intend to run SQL database on that cluster and that way I might loose a huge amount of data. I also have ancient two node cluster running heartbeat 1 and drbd 6.x with drbdisk resource, and its behavior in that case is to stop and ask "My data may be outdated, are you sure you want to continue?". I tried the same scenario without cluster engine (that is the old way, isnt it) - enabled DRBD init scripts and repeated same steps. In that particular case DRBD stopped waiting the other node and asks if i want to continue (good boy, that is exactly what I want!). So, my problem must be somewhere in configuration of resources, but I can't understand what I'm doing wrong. So, let ask straight - How to do this in pacemaker. I just want node1 stops and waits for my confirmation what to do or something of that sort, but never ever promote drbd to master! If somebody wonders why I do this scenario, let me explain: My company own a APC Smart UPS, who in case of power loss shut down one node of each cluster (we have two pairs of clusters in separate vlans, so I cant create a 4 node cluster, which will solve this problem, at least partially) after the level of the battery falls bellow certain level. If battery runs below the critical level, the UPS kills all servers, but two our logging server and one of DB nodes. If the power dont come, than ups kills the last node. The only machine that waits to its the death is our logging server. When the power comes, the UPS starts all servers, that are down. If that happens when all nodes are down, we end with the following situation - the first node that comes up become SynkSource, and that node may be not the last one survived the UPS rage. One of the possible sollutions is to use old heartbeat resource manager drbddisk, that uses the drbd init script. But I don like it :) Here are my configs: corosync.conf totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: off threads: 0 rrp_mode: passive #external interface interface { ringnumber: 0 bindnetaddr: 10.0.30.0 mcastaddr: 226.94.1.1 mcastport: 5405 } #internal interface interface { ringnumber: 1 bindnetaddr: 10.2.2.0 mcastaddr: 226.94.2.1 mcastport: 5405 } } amf { mode: disabled } service { ver: 0 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } ------------- drbd.conf common { syncer { rate 100M; } } resource drbd0 { protocol C; handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root"; } startup { wfc-timeout 0; degr-wfc-timeout 0; } disk { on-io-error detach; fencing resource-only;} net { sndbuf-size 1024k; timeout 20; # 6 seconds (unit = 0.1 seconds) connect-int 10; # 10 seconds (unit = 1 second) ping-int 3; # 10 seconds (unit = 1 second) ping-timeout 5; # 500 ms (unit = 0.1 seconds) ko-count 4; cram-hmac-alg "sha1"; shared-secret "password"; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; } on db1 { device /dev/drbd0; disk /dev/db/db; address 10.2.2.1:7788; flexible-meta-disk internal; } on db2 { device /dev/drbd0; disk /dev/db/db; address 10.2.2.2:7788; meta-disk internal; } } ----------------------- crm: crm(live)# configure show node db1 \ attributes standby="off" node db2 \ attributes standby="off" primitive drbd-db ocf:linbit:drbd \ params drbd_resource="drbd0" \ op monitor interval="15s" role="Slave" timeout="30" \ op monitor interval="16s" role="Master" timeout="30" primitive fs-db ocf:heartbeat:Filesystem \ params fstype="ext3" directory="/db" device="/dev/drbd0" primitive ip-dbclust.v52 ocf:heartbeat:IPaddr2 \ params ip="10.0.30.211" broadcast="10.0.30.255" nic="eth1" cidr_netmask="24" \ op monitor interval="21s" timeout="5s" ms ms-db drbd-db \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" location drbd-fence-by-handler-ms-db ms-db \ rule $id="drbd-fence-by-handler-rule-ms-db" $role="Master" -inf: #uname ne db1 location lo-ms-db ms-db \ rule $id="ms-db-loc-rule" -inf: #uname ne db1 and #uname ne db2 colocation fs-on-drbd0 inf: fs-db ms-db:Master colocation ip-on-drbd0 inf: ip-dbclust.v52 ms-db:Master order or-drbd-bf-fs inf: ms-db:promote fs-db:start order or-drbd-bf-ip inf: ms-db:promote ip-dbclust.v52:start property $id="cib-bootstrap-options" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ expected-quorum-votes="2" \ last-lrm-refresh="1264523323" \ dc-version="1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe" \ cluster-infrastructure="openais" I hope somebody can help me, I am completely lost :( _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker