Hi Andrei, Andrei Dorofeev wrote:
> Aubrey, > > What version of the X8DTN BIOS are you using? I have filed > this issue back in February on premier.intel.com and it should've been > resolved by now. Do you see this with BIOS version 3059W? Right, right, the version is 3059W! And this is the newest version. >From the BIOS changelog, they tested windows, redhat linux, I'm not sure if they care about what solaris reported, :( > > Don't give up on using _PSD because of this. I believe before NHM, _PSD is seldom implemented. Using the vendor specific CPU topology should be an acceptable way to build domain info. I admit I don't have the knowledge about the other CPU vendor, like SPARC and AMD, what's the benefit of using _PSD? Othering than introducing panic, ;) Thanks, -Aubrey > > Thanks, > Andrei > > On Mon, Mar 30, 2009 at 7:19 AM, Mark Haywood > <Mark.Haywood at sun.com> wrote: >> Li, Aubrey wrote: >>> >>> Today a colleague reported a kernel panic issue to me when he is >>> installing Build110 to a Nehalem EP platform. It's Supermicro's >>> "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!) >>> ================================================== >>> $ prtdiag >>> System Configuration: Supermicro X8DTN >>> BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009 >>> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style) >>> >>> ==== Processor Sockets ============================== >>> >>> Version ? ? ? ? ? ? ? ? ? ? ? ? ?Location Tag >>> -------------------------------- -------------------------- >>> Intel(R) Xeon(R) CPU ? ? ? ? ? X5560 ?@ 2.80GH CPU 1 >>> Intel(R) Xeon(R) CPU ? ? ? ? ? X5560 ?@ 2.80GH CPU 2 >>> ================================================= >>> >>> panic stack as follows: >>> ========================================== >>> unix: cmntrap() >>> unix: cmt_balance + b2 ?<============Div ZERO >>> unix: setbackdq() >>> genunix: sleepq_wakeone_chan() >>> genunix: cv_signal() >>> genunix: delay_wakeup() >>> genunix: callout_list_expire() >>> genunix: callout_expire() >>> genunix: callout_execute() >>> genunix: taskq_thread() >>> unix: thread_start() >>> ========================================= >>> >>> After some investigation, I root caused this problem. The buggy >>> BIOS _PSD implementation messed the processor group >>> structure up, see below >>> ===================== >>> Socket 0: >>> cpu0~3: in domain 0 >>> cpu8~11: ? ? ? ?in domain 1 >>> >>> Socke 1: >>> cpu4~7: in domain 0 >>> cpu12~15: ? ? ? in domain 1 >>> ===================== >>> I rebuild a kernel to obtain the domain info by cpuid_get_chipid() >>> instead and the problem is gone. >>> >>> I remember Eric had a workaround about this issue but apparently >>> it doesn't cover this case, :( >>> >>> So, now I'm suggesting to remove _PSD and _CSD and _TSD related >>> implementation at all. We build the domain info by the cpu topology >>> structure instead. >>> >>> Any thoughts? >>> >> >> Are we running into that many problems with these objects? I'd hate >> to see us circumvent these ACPI objects because of rare BIOS bugs. >> Next thing you know we'll be building our own state (_PSS, _TSS, >> _CST) tables as well. >> >> I imagine that we could determine the domain structure based on >> topology. ?However, I'm uncertain about how we determine domain >> type? Is that straightforward? >> >> If we go this route, then we probably want to make the code that >> builds the domains vendor specific. Because I'm not sure that all >> CPU vendors will have the same algorithm for determining domains. >> And we we might just have the other vendors default to the _PSD. >> >>> Thanks, >>> -Aubrey >>> >> >> _______________________________________________ >> tesla-dev mailing list >> tesla-dev at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
