[Bug 158288] Node hangs at clvm when joining cluster

funkypantsboy Mon, 29 Oct 2007 02:45:17 -0800

Public bug reported:

Binary package hint: redhat-cluster-suite


This bug relates to the latest security release of redhat-cluster-suite
for Dapper (1.20060222-0ubuntu6.1).  My kernel version is
2.6.15-29-amd64-server.

I have a cluster with three nodes all attached to a Fibre SAN (an HP
MSA1000) and serving a GFS filesystem.  These machines have been working
nicely for a long time.  On the weekend I apt-get updated to the latest
version of redhat-cluster-suite.  Now, when the cluster boots the first
two nodes to come up are able to see the GFS filesystem.  However, the
third node to come up hangs at the point of starting the clvm service.
Concomitantly, I see the following message in /var/log/syslog of one of
the other machines in the cluster:

Oct 28 14:42:18 machinea kernel: [ 1681.325152] CMAN: node machinec rejoining
Oct 28 14:42:20 machinea kernel: [ 1683.528299] Extra connection from node 2 
attempted

It does not seem to matter which order the nodes come up in - it is
always the third node to boot that will hang when starting clvmd. I
think this may have something to do with inclusion of the fix to bug
bz#245892 in the kernel (http://lkml.org/lkml/2007/7/9/274)???  I have
included my cluster.conf file below for reference - I can include any
additional diagnostics as required.

Help??

Stephen

<?xml version="1.0"?>
<cluster config_version="14" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="machineaint" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="machinea_ILO"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="machinebint" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="machineb_ILO"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="machinecint" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="machinec_ILO"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="192.168.81.200" 
login="Login" name="machinea_ILO" passwd="Passwd"/>
                <fencedevice agent="fence_ilo" hostname="192.168.81.199" 
login="Login" name="machineb_ILO" passwd="Passwd"/>
                <fencedevice agent="fence_ilo" hostname="192.168.81.197" 
login="Login" name="machinec_ILO" passwd="Passwd"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="fileservers" ordered="0" 
restricted="0">
                                <failoverdomainnode name="machineaint" 
priority="1"/>
                                <failoverdomainnode name="machinebint" 
priority="1"/>
                                <failoverdomainnode name="machinecint" 
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="backupers" ordered="0" 
restricted="1">
                                <failoverdomainnode name="machineaint" 
priority="1"/>
                                <failoverdomainnode name="machinebint" 
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.81.98" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="fileservers" exclusive="1" 
name="fileserver_ip">
                        <ip ref="192.168.81.98"/>
                </service>
                <service autostart="1" domain="backupers" name="backups">
                        <script file="/etc/init.d/dsmcad-init" name="TSM backup 
script"/>
                </service>
        </rm>
</cluster>

** Affects: redhat-cluster-suite (Ubuntu)
     Importance: Undecided
         Status: New

-- 
Node hangs at clvm when joining cluster
https://bugs.launchpad.net/bugs/158288
You received this bug notification because you are a member of Ubuntu
Bugs, which is the bug contact for Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 158288] Node hangs at clvm when joining cluster

Reply via email to