Today a colleague reported a kernel panic issue to me when he is installing Build110 to a Nehalem EP platform. It's Supermicro's "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!) ================================================== $ prtdiag System Configuration: Supermicro X8DTN BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009 BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)
==== Processor Sockets ============================== Version Location Tag -------------------------------- -------------------------- Intel(R) Xeon(R) CPU X5560 @ 2.80GH CPU 1 Intel(R) Xeon(R) CPU X5560 @ 2.80GH CPU 2 ================================================= panic stack as follows: ========================================== unix: cmntrap() unix: cmt_balance + b2 <============Div ZERO unix: setbackdq() genunix: sleepq_wakeone_chan() genunix: cv_signal() genunix: delay_wakeup() genunix: callout_list_expire() genunix: callout_expire() genunix: callout_execute() genunix: taskq_thread() unix: thread_start() ========================================= After some investigation, I root caused this problem. The buggy BIOS _PSD implementation messed the processor group structure up, see below ===================== Socket 0: cpu0~3: in domain 0 cpu8~11: in domain 1 Socke 1: cpu4~7: in domain 0 cpu12~15: in domain 1 ===================== I rebuild a kernel to obtain the domain info by cpuid_get_chipid() instead and the problem is gone. I remember Eric had a workaround about this issue but apparently it doesn't cover this case, :( So, now I'm suggesting to remove _PSD and _CSD and _TSD related implementation at all. We build the domain info by the cpu topology structure instead. Any thoughts? Thanks, -Aubrey
