Re: [CentOS] Weird CentOS 5.3 problem

2009-05-19 Thread Randall Martin



> From: Robert Heller 
> Organization: Deepwoods Software
> Reply-To: CentOS mailing list 
> Date: Tue, 19 May 2009 09:46:15 -0400
> To: CentOS mailing list 
> Cc: 
> Subject: Re: [CentOS] Weird CentOS 5.3 problem
> 
> At Tue, 19 May 2009 09:04:43 -0400 CentOS mailing list 
> wrote:
> 
>> 
>> 
>> 
>> I reimaged a compute node on our cluster with the latest 5.3 updates (we
>> were previously running 5.2), but we kept the kernel at 2.6.18-92.1.10.el5
>> until I can find time to rebuild some of our kernel modules.  After the
>> image install finishes and the system reboots, the eth0 ethernet interface
>> disappears.  If I do an ifconfig ­a, I see what should be eth0, but it¹s
>> listed as __tmp2081258173.
>> 
>> [r...@node0770 ~]# ifconfig -a
>> __tmp2081258173 Link encap:Ethernet  HWaddr 00:1E:68:86:67:04
>>   BROADCAST MULTICAST  MTU:1500  Metric:1
>>   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000
>>   RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>>   Interrupt:66
>> 
>> The dmesg output isn¹t very helpful:
>> 
>> [r...@node0770 ~]# dmesg|grep eth0
>> eth0: forcedeth.c: subsystem: 0108e:534b bound to :00:08.0
>> 
>> 
>> If I remove our lustre modules that were built for the 2.6.18-92.1.10.el5
>> kernel and reboot, the eth0 interface reappears.  Another piece to this
>> puzzle is that this problem only seems to happen on our Sun X2200¹s.  Our
>> Dell 1950¹s work just fine after putting on the 5.3 updates.  Anyone know
>> what could cause this behavior?
> 
> Check /etc/modprobe.conf (and
> /etc/sysconfig/network-scripts/if-cfg-eth0) -- if you are doing a
> disk-to-disk backup type of install, the alias for eth0 is very likely
> wrong (and the HW address in /etc/sysconfig/network-scripts/if-cfg-eth0
> is also wrong).  You may have to manually update these two files on the
> 'new' machine, since it likely has a different NIC, requiring a
> different driver.  It will also have a different MAC (HW) address as
> well. In the old days, kudzu would detect this and pop up during the
> boot process.
> 
> What does lspci display?
> 


We add the two lines at the end of modprobe.conf for lustre.

alias eth0 tg3
alias eth1 tg3
alias eth2 forcedeth
alias eth3 forcedeth
alias scsi_hostadapter sata_nv
options lnet networks="tcp0(eth0)"
options ksocklnd enable_irq_affinity=0


The /etc/sysconfig/network-scripts/ifcfg-eth0 has the correct settings for
this host.  We actually generate this file during the post-install.  Here's
what it looks like:

DEVICE=eth0
BOOTPROTO=none
STARTMODE=onboot
ONBOOT=yes
USERCTL=no
TYPE=Ethernet
IPV6INIT=no
IPADDR=192.168.3.91
BROADCAST=192.168.255.255
NETMASK=255.255.0.0
GATEWAY=192.168.100.1


Here's the lspci output:

00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ASPEED Technology, Inc. AST20

[CentOS] Weird CentOS 5.3 problem

2009-05-19 Thread Randall Martin
I reimaged a compute node on our cluster with the latest 5.3 updates (we
were previously running 5.2), but we kept the kernel at 2.6.18-92.1.10.el5
until I can find time to rebuild some of our kernel modules.  After the
image install finishes and the system reboots, the eth0 ethernet interface
disappears.  If I do an ifconfig ­a, I see what should be eth0, but it¹s
listed as __tmp2081258173.

[r...@node0770 ~]# ifconfig -a
__tmp2081258173 Link encap:Ethernet  HWaddr 00:1E:68:86:67:04
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:66

The dmesg output isn¹t very helpful:

[r...@node0770 ~]# dmesg|grep eth0
eth0: forcedeth.c: subsystem: 0108e:534b bound to :00:08.0


If I remove our lustre modules that were built for the 2.6.18-92.1.10.el5
kernel and reboot, the eth0 interface reappears.  Another piece to this
puzzle is that this problem only seems to happen on our Sun X2200¹s.  Our
Dell 1950¹s work just fine after putting on the 5.3 updates.  Anyone know
what could cause this behavior?

Thanks,
Randy
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos