Today a colleague reported a kernel panic issue to me when he is
installing Build110 to a Nehalem EP platform. It's Supermicro's 
"Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!)
==================================================
$ prtdiag
System Configuration: Supermicro X8DTN
BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009
BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)

==== Processor Sockets ==============================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Xeon(R) CPU           X5560  @ 2.80GH CPU 1
Intel(R) Xeon(R) CPU           X5560  @ 2.80GH CPU 2
=================================================

panic stack as follows:
==========================================
unix: cmntrap()
unix: cmt_balance + b2  <============Div ZERO
unix: setbackdq()
genunix: sleepq_wakeone_chan()
genunix: cv_signal()
genunix: delay_wakeup()
genunix: callout_list_expire()
genunix: callout_expire()
genunix: callout_execute()
genunix: taskq_thread()
unix: thread_start()
=========================================

After some investigation, I root caused this problem. The buggy
BIOS _PSD implementation messed the processor group
structure up, see below
=====================
Socket 0:
cpu0~3: in domain 0
cpu8~11:        in domain 1

Socke 1:
cpu4~7: in domain 0
cpu12~15:       in domain 1
=====================
I rebuild a kernel to obtain the domain info by cpuid_get_chipid()
instead and the problem is gone.

I remember Eric had a workaround about this issue but apparently
it doesn't cover this case, :(

So, now I'm suggesting to remove _PSD and _CSD and _TSD related
implementation at all. We build the domain info by the cpu topology 
structure instead.

Any thoughts?

Thanks,
-Aubrey

Reply via email to