I'm working on setting up some new lab hardware, among which is a shiny new dual Opteron server running RHEL 3. The box has two Opteron 244 CPU's and 6GB of DDR ECC RAM installed in six 1GB sticks (there are 8 total DIMM slots). The motherboard is a Tyan S2882 (a.k.a. Thunder K8S Pro). Everything seems to run fine in the limited testing I've done so far, but every few seconds I see this appear in the syslog:
Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: CPU 0: Silent Northbridge MCE Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: Northbridge status a40000000005001b Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: GART TLB error generic level generic Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: extended error gart error Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: link number 0 Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: error address valid Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: error uncorrected Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: previous error lost Feb 12 18:04:11 rtp-wbu-sh-m1 kernel: error address 00000000fafe1a68 I thought this looked like possibly bad RAM. But when I pull out two--*any* two--sticks of RAM, the error message goes away. It seems to be tied in to the fact that I have >4GB of memory. According to the motherboard manual, when you use more than 6 DIMMs on this board, you're using a 128-bit (interleaved) memory configuration as opposed to a 64-bit (noninterleaved) configuration with 4 or fewer DIMMs (ref. page 30 of ftp://ftp.tyan.com/manuals/m_s2882_101.pdf), if that's any hint. I've tried rearranging the DIMMs in every valid way listed in the motherboard's manual to no avail. I even ran memtest86 (www.memtest.org) just to be sure I didn't have bad RAM. I'm using RHEL stock kernel 2.4.21-9.ELsmp and had the same problem on 2.4.21-4.ELsmp. The box seems to run fine, but those errors clogging up my syslog have me worried. Anyone know what might be happening here? I'm not sure whether to complain to the vendor that something is fishy with their hardware or whether this is a software issue. At Your Service, -- Mark T. Voelker [EMAIL PROTECTED] root]# free total used free shared buffers cached Mem: 5976880 657272 5319608 0 105080 222268 -/+ buffers/cache: 329924 5646956 Swap: 2040244 0 2040244 [EMAIL PROTECTED] root]# uname -a Linux localhost.localdomain 2.4.21-9.ELsmp #1 SMP Thu Feb 12 16:03:39 EST 2004 x86_64 x86_64 x86_64 GNU/Linux
signature.asc
Description: This is a digitally signed message part
-- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
