Thorsten, can you send the crm_report archive to the list please? I had a look, and as far as I can tell the only reason drbd isn't being promoted is that it has a promotion score of -1. Which is definitely linbit's department :-)
On Thu, Sep 16, 2010 at 12:14 PM, Thorsten Scherf <tsch...@redhat.com> wrote: > On [Thu, 16.09.2010 11:21], Andrew Beekhof wrote: >> >> Technically the subject is incorrect - its a drbd issue. > > ack. :) > >> Can someone from linbit have a look? > > actually this seems to be only a problem with fence_ack_manual. I've > tested with a different fence_device and this worked without problem. > > >> On Wed, Sep 15, 2010 at 8:43 PM, Thorsten Scherf <tsch...@redhat.com> >> wrote: >>> >>> Hey, >>> >>> I'm currently trying latest pacemaker RPM on Fedora rawhide together with >>> cman/OpenAIS: >>> >>> cman-3.0.16-1.fc15.i686 >>> openais-1.1.4-1.fc15.i686 >>> pacemaker-1.1.2-7.fc13.i386 (rebuild from rhel6 beta) >>> >>> I have a very basic cluster.conf (only for testing): >>> >>> # cat /etc/cluster/cluster.conf <?xml version="1.0"?> >>> <cluster name="iscsicluster" config_version="2"> >>> <cman two_node="1" expected_votes="1"/> >>> <clusternodes> >>> <clusternode name="iscsi1" votes="1" nodeid="1"> >>> <fence> >>> <method name="1"> >>> <device name="manual" >>> nodename="iscsi1"/> >>> </method> >>> >>> </fence> >>> </clusternode> >>> <clusternode name="iscsi2" votes="1" nodeid="2"> >>> <fence> >>> <method name="1"> >>> <device name="manual" nodename="iscsi2"/> >>> </method> >>> </fence> >>> </clusternode> >>> </clusternodes> >>> <fencedevices> >>> <fencedevice agent="fence_manual" name="manual"/> >>> </fencedevices> >>> <rm/> >>> </cluster> >>> >>> pacemaker config looks like this: >>> >>> # crm configure show >>> node iscsi1 >>> node iscsi2 >>> primitive drbd_disk ocf:linbit:drbd \ >>> params drbd_resource="virt_machines" \ >>> op monitor interval="15s" >>> primitive ip_drbd ocf:heartbeat:IPaddr2 \ >>> params ip="192.168.122.100" cidr_netmask="24" \ >>> op monitor interval="10s" >>> primitive iscsi_lsb lsb:tgtd \ >>> op monitor interval="10s" >>> group rg_iscsi iscsi_lsb ip_drbd \ >>> meta target-role="Started" >>> ms ms_drbd_disk drbd_disk \ >>> meta master-max="1" master-node-max="1" clone-max="2" >>> clone-node-max="1" notify="true" target-role="Master" >>> location cli-prefer-rg_iscsi rg_iscsi \ >>> rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq iscsi2 >>> colocation c_iscsi_on_drbd inf: rg_iscsi ms_drbd_disk:Master >>> order o_drbd_before_iscsi inf: ms_drbd_disk:promote rg_iscsi:start >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ >>> cluster-infrastructure="cman" \ >>> stonith-enabled="false" \ >>> no-quorum-policy="ignore" >>> >>> this works fine so far: >>> >>> # crm_mon >>> ============ >>> Last updated: Wed Sep 15 18:06:42 2010 >>> Stack: cman >>> Current DC: iscsi1 - partition with quorum >>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe >>> 2 Nodes configured, unknown expected votes >>> 2 Resources configured. >>> ============ >>> >>> Online: [ iscsi1 iscsi2 ] >>> Resource Group: rg_iscsi >>> iscsi_lsb (lsb:tgtd): Started iscsi1 >>> ip_drbd (ocf::heartbeat:IPaddr2): Started iscsi1 >>> Master/Slave Set: ms_drbd_disk >>> Masters: [ iscsi1 ] >>> Slaves: [ iscsi2 ] >>> >>> for testing no fence device is configured. using fence_ack_manual to >>> confirm node shutdown, but that's exactly the problem. when I switch off >>> iscsi1, no resource failover happened after I called fence_ack_manual: >>> >>> /var/log/messages: >>> Sep 15 18:09:02 iscsi2 fenced[1171]: fence iscsi1 failed >>> >>> # fence_ack_manual >>> >>> /var/log/messages: >>> Sep 15 18:09:08 iscsi2 fenced[1171]: fence iscsi1 overridden by >>> administrator intervention >>> >>> # crm_mon: >>> ============ >>> Last updated: Wed Sep 15 18:09:26 2010 >>> Stack: cman >>> Current DC: iscsi2 - partition with quorum >>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe >>> 2 Nodes configured, unknown expected votes >>> 2 Resources configured. >>> ============ >>> >>> Online: [ iscsi2 ] >>> OFFLINE: [ iscsi1 ] >>> >>> Master/Slave Set: ms_drbd_disk >>> Slaves: [ iscsi2 ] >>> Stopped: [ drbd_disk:0 ] >>> >>> Failed actions: >>> drbd_disk:1_promote_0 (node=iscsi2, call=11, rc=1, status=complete): >>> unknown error >>> >>> # cibadmin -Q is available here: >>> http://pastebin.com/gRUwwVFF >>> Wondering why no service failover happened after I manually confirmed >>> the shutdown of the first node with fence_ack_manual. >>> >>> maybe someone knows what's going on?! >>> >>> Cheers, >>> Thorsten >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker