I could re-paste the whole thing, but its easier to just throw up the link:
http://theclusterguy.clusterlabs.org/post/205886990/advisory-dont-use-pacemaker-on-corosync-yet On Thu, Sep 24, 2009 at 4:56 PM, Remi Broemeling <r...@nexopia.com> wrote: > I posted this to the OpenAIS Mailing List ( > open...@lists.linux-foundation.org) yesterday, but haven't received a > response and upon further reflection I think that maybe I chose the wrong > list to post it to. That list seems to be far less about user support and > far more about developer communication. Therefore re-trying here, as the > archives show it to be somewhat more user-focused. > > The problem is that I'm having an issue with corosync refusing to shutdown > in response to a QUIT signal. Given the below cluster (output of crm_mon): > > ============ > Last updated: Wed Sep 23 15:56:24 2009 > Stack: openais > Current DC: boot1 - partition with quorum > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56 > 2 Nodes configured, 2 expected votes > 0 Resources configured. > ============ > > Online: [ boot1 boot2 ] > > If I go onto the host 'boot2', and issue the command "killall -QUIT > corosync", the anticipated result would be that boot2 would go offline (out > of the cluster), and all of the cluster processes > (corosync/stonithd/cib/lrmd/attrd/pengine/crmd) would shut-down. However, > this is not occurring, and I don't really have any idea why. After logging > into boot2, and issuing the command "killall -QUIT corosync", the result is > a split-brain: > > From boot1's viewpoint: > ============ > Last updated: Wed Sep 23 15:58:27 2009 > Stack: openais > Current DC: boot1 - partition WITHOUT quorum > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56 > 2 Nodes configured, 2 expected votes > 0 Resources configured. > ============ > > Online: [ boot1 ] > OFFLINE: [ boot2 ] > > From boot2's viewpoint: > ============ > Last updated: Wed Sep 23 15:58:35 2009 > Stack: openais > Current DC: boot1 - partition with quorum > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56 > 2 Nodes configured, 2 expected votes > 0 Resources configured. > ============ > > Online: [ boot1 boot2 ] > > At this point the status quo holds until such time as ANOTHER QUIT signal > is sent to corosync, (i.e. the command "killall -QUIT corosync" is executed > on boot2 again). Then, boot2 shuts down properly and everything appears to > be kosher. Basically, what I expect to happen after a single QUIT signal is > instead taking two QUIT signals to occur; and that summarizes my question: > why does it take two QUIT signals to force corosync to actually shutdown? > Is that desired behavior? From everything online that I have read it seems > to be very strange, and it makes me think that I have a problem in my > configuration(s), but I've no idea what that would be even after playing > with things and investigating for the day. > > I would be very grateful for any guidance that could be provided, as at the > moment I seem to be at an impasse. > > Log files, with debugging set to 'on', can be found at the following > pastebin locations: > After first QUIT signal issued on boot2: > boot1:/var/log/syslog: http://pastebin.com/m7f9a61fd > boot2:/var/log/syslog: http://pastebin.com/d26fdfee > After second QUIT signal issued on boot2: > boot1:/var/log/syslog: http://pastebin.com/m755fb989 > boot2:/var/log/syslog: http://pastebin.com/m22dcef45 > > OS, Software Packages, and Versions: > * two nodes, each running Ubuntu Hardy Heron LTS > * ubuntu-ha packages, as downloaded from > http://ppa.launchpad.net/ubuntu-ha-maintainers/ppa/ubuntu/: > * pacemaker-openais package version > 1.0.5+hg20090813-0ubuntu2~hardy1 > * openais package version 1.0.0-3ubuntu1~hardy1 > * corosync package version 1.0.0-4ubuntu1~hardy2 > * heartbeat-common package version > heartbeat-common_2.99.2+sles11r9-5ubuntu1~hardy1 > > Network Setup: > * boot1 > * eth0 is 192.168.10.192 > * eth1 is 172.16.1.1 > * boot2 > * eth0 is 192.168.10.193 > * eth1 is 172.16.1.2 > * boot1:eth0 and boot2:eth0 both connect to the same switch. > * boot1:eth1 and boot2:eth1 are connected directly to each other via a > cross-over cable. > * no firewalls are involved, and tcpdump shows the multicast and UDP > traffic flowing correctly over these links. > * I attempted a broadcast (rather than multicast) configuration, to see > if that would fix the problem. It did not. > > `crm configure show` output: > node boot1 > node boot2 > property $id="cib-bootstrap-options" \ > dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > > Contents of /etc/corosync/corosync.conf: > # Please read the corosync.conf.5 manual page > compatibility: whitetank > > totem { > clear_node_high_bit: yes > version: 2 > secauth: on > threads: 1 > heartbeat_failures_allowed: 3 > interface { > ringnumber: 0 > bindnetaddr: 172.16.1.0 > mcastaddr: 239.42.0.1 > mcastport: 5505 > } > interface { > ringnumber: 1 > bindnetaddr: 192.168.10.0 > mcastaddr: 239.42.0.2 > mcastport: 6606 > } > rrp_mode: passive > } > > amf { > mode: disabled > } > > service { > name: pacemaker > ver: 0 > } > > aisexec { > user: root > group: root > } > > logging { > debug: on > fileline: off > function_name: off > to_logfile: no > to_stderr: no > to_syslog: yes > timestamp: on > logger_subsys { > subsys: AMF > debug: off > tags: enter|leave|trace1|trace2|trace3|trace4|trace6 > } > } > -- > > Remi Broemeling > Sr System Administrator > > Nexopia.com Inc. > direct: 780 444 1250 ext 435 > email: r...@nexopia.com > fax: 780 487 0376 > > [image: www.nexopia.com] <http://www.nexopia.com> > > Cat toys, n.: Anything not nailed down, and some that are. > http://www.fortlangley.ca/pepin/taglines.html > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >
<<signature.logo.png>>
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker