Hi Andrei,

Andrei Dorofeev wrote:

> Aubrey,
> 
> What version of the X8DTN BIOS are you using?  I have filed
> this issue back in February on premier.intel.com and it should've been
> resolved by now.   Do you see this with BIOS version 3059W?

Right, right, the version is 3059W! And this is the newest version.
>From the BIOS changelog, they tested windows, redhat linux, I'm not sure
if they care about what solaris reported, :(

> 
> Don't give up on using _PSD because of this.

I believe before NHM, _PSD is seldom implemented. Using the vendor
specific CPU topology should be an acceptable way to build domain info.
I admit I don't have the knowledge about the other CPU vendor, like SPARC
and AMD, what's the benefit of using _PSD? Othering than introducing panic, ;)

Thanks,
-Aubrey

> 
> Thanks,
> Andrei
> 
> On Mon, Mar 30, 2009 at 7:19 AM, Mark Haywood
> <Mark.Haywood at sun.com> wrote:
>> Li, Aubrey wrote:
>>> 
>>> Today a colleague reported a kernel panic issue to me when he is
>>> installing Build110 to a Nehalem EP platform. It's Supermicro's
>>> "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!)
>>> ==================================================
>>> $ prtdiag
>>> System Configuration: Supermicro X8DTN
>>> BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009
>>> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)
>>> 
>>> ==== Processor Sockets ==============================
>>> 
>>> Version ? ? ? ? ? ? ? ? ? ? ? ? ?Location Tag
>>> -------------------------------- --------------------------
>>> Intel(R) Xeon(R) CPU ? ? ? ? ? X5560 ?@ 2.80GH CPU 1
>>> Intel(R) Xeon(R) CPU ? ? ? ? ? X5560 ?@ 2.80GH CPU 2
>>> =================================================
>>> 
>>> panic stack as follows:
>>> ==========================================
>>> unix: cmntrap()
>>> unix: cmt_balance + b2 ?<============Div ZERO
>>> unix: setbackdq()
>>> genunix: sleepq_wakeone_chan()
>>> genunix: cv_signal()
>>> genunix: delay_wakeup()
>>> genunix: callout_list_expire()
>>> genunix: callout_expire()
>>> genunix: callout_execute()
>>> genunix: taskq_thread()
>>> unix: thread_start()
>>> =========================================
>>> 
>>> After some investigation, I root caused this problem. The buggy
>>> BIOS _PSD implementation messed the processor group
>>> structure up, see below
>>> =====================
>>> Socket 0:
>>> cpu0~3: in domain 0
>>> cpu8~11: ? ? ? ?in domain 1
>>> 
>>> Socke 1:
>>> cpu4~7: in domain 0
>>> cpu12~15: ? ? ? in domain 1
>>> =====================
>>> I rebuild a kernel to obtain the domain info by cpuid_get_chipid()
>>> instead and the problem is gone.
>>> 
>>> I remember Eric had a workaround about this issue but apparently
>>> it doesn't cover this case, :(
>>> 
>>> So, now I'm suggesting to remove _PSD and _CSD and _TSD related
>>> implementation at all. We build the domain info by the cpu topology
>>> structure instead. 
>>> 
>>> Any thoughts?
>>> 
>> 
>> Are we running into that many problems with these objects? I'd hate
>> to see us circumvent these ACPI objects because of rare BIOS bugs.
>> Next thing you know we'll be building our own state (_PSS, _TSS,
>> _CST) tables as well. 
>> 
>> I imagine that we could determine the domain structure based on
>> topology. ?However, I'm uncertain about how we determine domain
>> type? Is that straightforward? 
>> 
>> If we go this route, then we probably want to make the code that
>> builds the domains vendor specific. Because I'm not sure that all
>> CPU vendors will have the same algorithm for determining domains.
>> And we we might just have the other vendors default to the _PSD.
>> 
>>> Thanks,
>>> -Aubrey
>>> 
>> 
>> _______________________________________________
>> tesla-dev mailing list
>> tesla-dev at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev


Reply via email to