Public bug reported: Binary package hint: redhat-cluster-suite
This bug relates to the latest security release of redhat-cluster-suite for Dapper (1.20060222-0ubuntu6.1). My kernel version is 2.6.15-29-amd64-server. I have a cluster with three nodes all attached to a Fibre SAN (an HP MSA1000) and serving a GFS filesystem. These machines have been working nicely for a long time. On the weekend I apt-get updated to the latest version of redhat-cluster-suite. Now, when the cluster boots the first two nodes to come up are able to see the GFS filesystem. However, the third node to come up hangs at the point of starting the clvm service. Concomitantly, I see the following message in /var/log/syslog of one of the other machines in the cluster: Oct 28 14:42:18 machinea kernel: [ 1681.325152] CMAN: node machinec rejoining Oct 28 14:42:20 machinea kernel: [ 1683.528299] Extra connection from node 2 attempted It does not seem to matter which order the nodes come up in - it is always the third node to boot that will hang when starting clvmd. I think this may have something to do with inclusion of the fix to bug bz#245892 in the kernel (http://lkml.org/lkml/2007/7/9/274)??? I have included my cluster.conf file below for reference - I can include any additional diagnostics as required. Help?? Stephen <?xml version="1.0"?> <cluster config_version="14" name="alpha_cluster"> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="machineaint" votes="1"> <fence> <method name="1"> <device name="machinea_ILO"/> </method> </fence> </clusternode> <clusternode name="machinebint" votes="1"> <fence> <method name="1"> <device name="machineb_ILO"/> </method> </fence> </clusternode> <clusternode name="machinecint" votes="1"> <fence> <method name="1"> <device name="machinec_ILO"/> </method> </fence> </clusternode> </clusternodes> <cman/> <fencedevices> <fencedevice agent="fence_ilo" hostname="192.168.81.200" login="Login" name="machinea_ILO" passwd="Passwd"/> <fencedevice agent="fence_ilo" hostname="192.168.81.199" login="Login" name="machineb_ILO" passwd="Passwd"/> <fencedevice agent="fence_ilo" hostname="192.168.81.197" login="Login" name="machinec_ILO" passwd="Passwd"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="fileservers" ordered="0" restricted="0"> <failoverdomainnode name="machineaint" priority="1"/> <failoverdomainnode name="machinebint" priority="1"/> <failoverdomainnode name="machinecint" priority="1"/> </failoverdomain> <failoverdomain name="backupers" ordered="0" restricted="1"> <failoverdomainnode name="machineaint" priority="1"/> <failoverdomainnode name="machinebint" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip address="192.168.81.98" monitor_link="1"/> </resources> <service autostart="1" domain="fileservers" exclusive="1" name="fileserver_ip"> <ip ref="192.168.81.98"/> </service> <service autostart="1" domain="backupers" name="backups"> <script file="/etc/init.d/dsmcad-init" name="TSM backup script"/> </service> </rm> </cluster> ** Affects: redhat-cluster-suite (Ubuntu) Importance: Undecided Status: New -- Node hangs at clvm when joining cluster https://bugs.launchpad.net/bugs/158288 You received this bug notification because you are a member of Ubuntu Bugs, which is the bug contact for Ubuntu. -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
