[CentOS] Weird CentOS 5.3 problem

2009-05-19 Thread Randall Martin
I reimaged a compute node on our cluster with the latest 5.3 updates (we
were previously running 5.2), but we kept the kernel at 2.6.18-92.1.10.el5
until I can find time to rebuild some of our kernel modules.  After the
image install finishes and the system reboots, the eth0 ethernet interface
disappears.  If I do an ifconfig ­a, I see what should be eth0, but it¹s
listed as __tmp2081258173.

[r...@node0770 ~]# ifconfig -a
__tmp2081258173 Link encap:Ethernet  HWaddr 00:1E:68:86:67:04
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:66

The dmesg output isn¹t very helpful:

[r...@node0770 ~]# dmesg|grep eth0
eth0: forcedeth.c: subsystem: 0108e:534b bound to :00:08.0


If I remove our lustre modules that were built for the 2.6.18-92.1.10.el5
kernel and reboot, the eth0 interface reappears.  Another piece to this
puzzle is that this problem only seems to happen on our Sun X2200¹s.  Our
Dell 1950¹s work just fine after putting on the 5.3 updates.  Anyone know
what could cause this behavior?

Thanks,
Randy
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Weird CentOS 5.3 problem

2009-05-19 Thread Robert Heller
At Tue, 19 May 2009 09:04:43 -0400 CentOS mailing list centos@centos.org 
wrote:

 
 
 
 I reimaged a compute node on our cluster with the latest 5.3 updates (we
 were previously running 5.2), but we kept the kernel at 2.6.18-92.1.10.el5
 until I can find time to rebuild some of our kernel modules.  After the
 image install finishes and the system reboots, the eth0 ethernet interface
 disappears.  If I do an ifconfig ­a, I see what should be eth0, but it¹s
 listed as __tmp2081258173.
 
 [r...@node0770 ~]# ifconfig -a
 __tmp2081258173 Link encap:Ethernet  HWaddr 00:1E:68:86:67:04
   BROADCAST MULTICAST  MTU:1500  Metric:1
   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
   Interrupt:66
 
 The dmesg output isn¹t very helpful:
 
 [r...@node0770 ~]# dmesg|grep eth0
 eth0: forcedeth.c: subsystem: 0108e:534b bound to :00:08.0
 
 
 If I remove our lustre modules that were built for the 2.6.18-92.1.10.el5
 kernel and reboot, the eth0 interface reappears.  Another piece to this
 puzzle is that this problem only seems to happen on our Sun X2200¹s.  Our
 Dell 1950¹s work just fine after putting on the 5.3 updates.  Anyone know
 what could cause this behavior?

Check /etc/modprobe.conf (and
/etc/sysconfig/network-scripts/if-cfg-eth0) -- if you are doing a
disk-to-disk backup type of install, the alias for eth0 is very likely
wrong (and the HW address in /etc/sysconfig/network-scripts/if-cfg-eth0
is also wrong).  You may have to manually update these two files on the
'new' machine, since it likely has a different NIC, requiring a
different driver.  It will also have a different MAC (HW) address as
well. In the old days, kudzu would detect this and pop up during the
boot process.

What does lspci display?

 
 Thanks,
 Randy
 MIME-Version: 1.0
 
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
 
  

-- 
Robert Heller -- 978-544-6933
Deepwoods Software-- Download the Model Railroad System
http://www.deepsoft.com/  -- Binaries for Linux and MS-Windows
hel...@deepsoft.com   -- http://www.deepsoft.com/ModelRailroadSystem/
 
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Weird CentOS 5.3 problem

2009-05-19 Thread Randall Martin



 From: Robert Heller hel...@deepsoft.com
 Organization: Deepwoods Software
 Reply-To: CentOS mailing list centos@centos.org
 Date: Tue, 19 May 2009 09:46:15 -0400
 To: CentOS mailing list centos@centos.org
 Cc: centos@centos.org
 Subject: Re: [CentOS] Weird CentOS 5.3 problem
 
 At Tue, 19 May 2009 09:04:43 -0400 CentOS mailing list centos@centos.org
 wrote:
 
 
 
 
 I reimaged a compute node on our cluster with the latest 5.3 updates (we
 were previously running 5.2), but we kept the kernel at 2.6.18-92.1.10.el5
 until I can find time to rebuild some of our kernel modules.  After the
 image install finishes and the system reboots, the eth0 ethernet interface
 disappears.  If I do an ifconfig ­a, I see what should be eth0, but it¹s
 listed as __tmp2081258173.
 
 [r...@node0770 ~]# ifconfig -a
 __tmp2081258173 Link encap:Ethernet  HWaddr 00:1E:68:86:67:04
   BROADCAST MULTICAST  MTU:1500  Metric:1
   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
   Interrupt:66
 
 The dmesg output isn¹t very helpful:
 
 [r...@node0770 ~]# dmesg|grep eth0
 eth0: forcedeth.c: subsystem: 0108e:534b bound to :00:08.0
 
 
 If I remove our lustre modules that were built for the 2.6.18-92.1.10.el5
 kernel and reboot, the eth0 interface reappears.  Another piece to this
 puzzle is that this problem only seems to happen on our Sun X2200¹s.  Our
 Dell 1950¹s work just fine after putting on the 5.3 updates.  Anyone know
 what could cause this behavior?
 
 Check /etc/modprobe.conf (and
 /etc/sysconfig/network-scripts/if-cfg-eth0) -- if you are doing a
 disk-to-disk backup type of install, the alias for eth0 is very likely
 wrong (and the HW address in /etc/sysconfig/network-scripts/if-cfg-eth0
 is also wrong).  You may have to manually update these two files on the
 'new' machine, since it likely has a different NIC, requiring a
 different driver.  It will also have a different MAC (HW) address as
 well. In the old days, kudzu would detect this and pop up during the
 boot process.
 
 What does lspci display?
 


We add the two lines at the end of modprobe.conf for lustre.

alias eth0 tg3
alias eth1 tg3
alias eth2 forcedeth
alias eth3 forcedeth
alias scsi_hostadapter sata_nv
options lnet networks=tcp0(eth0)
options ksocklnd enable_irq_affinity=0


The /etc/sysconfig/network-scripts/ifcfg-eth0 has the correct settings for
this host.  We actually generate this file during the post-install.  Here's
what it looks like:

DEVICE=eth0
BOOTPROTO=none
STARTMODE=onboot
ONBOOT=yes
USERCTL=no
TYPE=Ethernet
IPV6INIT=no
IPADDR=192.168.3.91
BROADCAST=192.168.255.255
NETMASK=255.255.0.0
GATEWAY=192.168.100.1


Here's the lspci output:

00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ASPEED Technology, Inc