By the way, this should be fixed in build 111 by:

6812782 additional cmt lineage validation logic needed to defend against buggy 
_PSDs

You should be able to workaround by setting cpupm_enabled to "0" early in boot 
(via -kd)..

Thanks,
-Eric




Li, Aubrey wrote:
> Today a colleague reported a kernel panic issue to me when he is
> installing Build110 to a Nehalem EP platform. It's Supermicro's 
> "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!)
> ==================================================
> $ prtdiag
> System Configuration: Supermicro X8DTN
> BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009
> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)
>
> ==== Processor Sockets ==============================
>
> Version                          Location Tag
> -------------------------------- --------------------------
> Intel(R) Xeon(R) CPU           X5560  @ 2.80GH CPU 1
> Intel(R) Xeon(R) CPU           X5560  @ 2.80GH CPU 2
> =================================================
>
> panic stack as follows:
> ==========================================
> unix: cmntrap()
> unix: cmt_balance + b2  <============Div ZERO
> unix: setbackdq()
> genunix: sleepq_wakeone_chan()
> genunix: cv_signal()
> genunix: delay_wakeup()
> genunix: callout_list_expire()
> genunix: callout_expire()
> genunix: callout_execute()
> genunix: taskq_thread()
> unix: thread_start()
> =========================================
>
> After some investigation, I root caused this problem. The buggy
> BIOS _PSD implementation messed the processor group
> structure up, see below
> =====================
> Socket 0:
> cpu0~3:       in domain 0
> cpu8~11:      in domain 1
>
> Socke 1:
> cpu4~7:       in domain 0
> cpu12~15:     in domain 1
> =====================
> I rebuild a kernel to obtain the domain info by cpuid_get_chipid()
> instead and the problem is gone.
>
> I remember Eric had a workaround about this issue but apparently
> it doesn't cover this case, :(
>
> So, now I'm suggesting to remove _PSD and _CSD and _TSD related
> implementation at all. We build the domain info by the cpu topology 
> structure instead.
>
> Any thoughts?
>
> Thanks,
> -Aubrey
>   


Reply via email to