Mark Haywood wrote: > Mark Haywood wrote: >> Li, Aubrey wrote: >>> Hi Andrei, >>> >>> Andrei Dorofeev wrote: >>> >>> >>>> Aubrey, >>>> >>>> What version of the X8DTN BIOS are you using? I have filed >>>> this issue back in February on premier.intel.com and it should've been >>>> resolved by now. Do you see this with BIOS version 3059W? >>>> >>> >>> Right, right, the version is 3059W! And this is the newest version. >>> From the BIOS changelog, they tested windows, redhat linux, I'm not >>> sure >>> if they care about what solaris reported, :( >>> >>> >>>> Don't give up on using _PSD because of this. >>>> >>> >>> I believe before NHM, _PSD is seldom implemented. Using the vendor >>> specific CPU topology should be an acceptable way to build domain info. >>> I admit I don't have the knowledge about the other CPU vendor, like >>> SPARC >>> and AMD, what's the benefit of using _PSD? Othering than introducing >>> panic, ;) >>> >> >> That's a very good question. What is the benefit of the _PSD? I >> believe the _PSD was introduced to prevent Solaris and other >> operating systems from having to do exactly what you are proposing >> (i.e., introducing vendor specific details of processor state domains >> into the operating system). The _PSD is supposed to define a standard >> way for operating systems to digest the domain data. Unless there is >> a really compelling reason to ignore the _PSD, I would suggest that >> we continue to use it. > > How about a compromise. Since the _PSD often doesn't exist, we have to > be able to determine the domains from the topology. Since we have to > do this anyway, why don't we digest the _PSD and verify it afterwards > using the topology. Why do this? So that we can report bad _PSDs to > the BIOS developers so that the _PSDs become more reliable in the > future. If a faulty _PSD is identified we can log a message to the > console and use domains built from the topology. At some point it > would be nice if we didn't have to continue determining the domains > using topology. Sound reasonable? Sounds good to me.
Anup > > Mark > >> >> Mark >> >>> Thanks, >>> -Aubrey >>> >>> >>>> Thanks, >>>> Andrei >>>> >>>> On Mon, Mar 30, 2009 at 7:19 AM, Mark Haywood >>>> <Mark.Haywood at sun.com> wrote: >>>> >>>>> Li, Aubrey wrote: >>>>> >>>>>> Today a colleague reported a kernel panic issue to me when he is >>>>>> installing Build110 to a Nehalem EP platform. It's Supermicro's >>>>>> "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!) >>>>>> ================================================== >>>>>> $ prtdiag >>>>>> System Configuration: Supermicro X8DTN >>>>>> BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009 >>>>>> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style) >>>>>> >>>>>> ==== Processor Sockets ============================== >>>>>> >>>>>> Version Location Tag >>>>>> -------------------------------- -------------------------- >>>>>> Intel(R) Xeon(R) CPU X5560 @ 2.80GH CPU 1 >>>>>> Intel(R) Xeon(R) CPU X5560 @ 2.80GH CPU 2 >>>>>> ================================================= >>>>>> >>>>>> panic stack as follows: >>>>>> ========================================== >>>>>> unix: cmntrap() >>>>>> unix: cmt_balance + b2 <============Div ZERO >>>>>> unix: setbackdq() >>>>>> genunix: sleepq_wakeone_chan() >>>>>> genunix: cv_signal() >>>>>> genunix: delay_wakeup() >>>>>> genunix: callout_list_expire() >>>>>> genunix: callout_expire() >>>>>> genunix: callout_execute() >>>>>> genunix: taskq_thread() >>>>>> unix: thread_start() >>>>>> ========================================= >>>>>> >>>>>> After some investigation, I root caused this problem. The buggy >>>>>> BIOS _PSD implementation messed the processor group >>>>>> structure up, see below >>>>>> ===================== >>>>>> Socket 0: >>>>>> cpu0~3: in domain 0 >>>>>> cpu8~11: in domain 1 >>>>>> >>>>>> Socke 1: >>>>>> cpu4~7: in domain 0 >>>>>> cpu12~15: in domain 1 >>>>>> ===================== >>>>>> I rebuild a kernel to obtain the domain info by cpuid_get_chipid() >>>>>> instead and the problem is gone. >>>>>> >>>>>> I remember Eric had a workaround about this issue but apparently >>>>>> it doesn't cover this case, :( >>>>>> >>>>>> So, now I'm suggesting to remove _PSD and _CSD and _TSD related >>>>>> implementation at all. We build the domain info by the cpu topology >>>>>> structure instead. >>>>>> Any thoughts? >>>>>> >>>>>> >>>>> Are we running into that many problems with these objects? I'd hate >>>>> to see us circumvent these ACPI objects because of rare BIOS bugs. >>>>> Next thing you know we'll be building our own state (_PSS, _TSS, >>>>> _CST) tables as well. >>>>> I imagine that we could determine the domain structure based on >>>>> topology. However, I'm uncertain about how we determine domain >>>>> type? Is that straightforward? >>>>> If we go this route, then we probably want to make the code that >>>>> builds the domains vendor specific. Because I'm not sure that all >>>>> CPU vendors will have the same algorithm for determining domains. >>>>> And we we might just have the other vendors default to the _PSD. >>>>> >>>>> >>>>>> Thanks, >>>>>> -Aubrey >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> tesla-dev mailing list >>>>> tesla-dev at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev >>>>> >>> >>> >> >> _______________________________________________ >> tesla-dev mailing list >> tesla-dev at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/tesla-dev > > _______________________________________________ > tesla-dev mailing list > tesla-dev at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/tesla-dev
