Mark Haywood wrote: > Li, Aubrey wrote: >> Hi Andrei, >> >> Andrei Dorofeev wrote: >> >> >>> Aubrey, >>> >>> What version of the X8DTN BIOS are you using? I have filed >>> this issue back in February on premier.intel.com and it should've been >>> resolved by now. Do you see this with BIOS version 3059W? >>> >> >> Right, right, the version is 3059W! And this is the newest version. >> From the BIOS changelog, they tested windows, redhat linux, I'm not sure >> if they care about what solaris reported, :( >> >> >>> Don't give up on using _PSD because of this. >>> >> >> I believe before NHM, _PSD is seldom implemented. Using the vendor >> specific CPU topology should be an acceptable way to build domain info. >> I admit I don't have the knowledge about the other CPU vendor, like >> SPARC >> and AMD, what's the benefit of using _PSD? Othering than introducing >> panic, ;) >> > > That's a very good question. What is the benefit of the _PSD? I > believe the _PSD was introduced to prevent Solaris and other operating > systems from having to do exactly what you are proposing (i.e., > introducing vendor specific details of processor state domains into > the operating system). The _PSD is supposed to define a standard way > for operating systems to digest the domain data. Unless there is a > really compelling reason to ignore the _PSD, I would suggest that we > continue to use it.
How about a compromise. Since the _PSD often doesn't exist, we have to be able to determine the domains from the topology. Since we have to do this anyway, why don't we digest the _PSD and verify it afterwards using the topology. Why do this? So that we can report bad _PSDs to the BIOS developers so that the _PSDs become more reliable in the future. If a faulty _PSD is identified we can log a message to the console and use domains built from the topology. At some point it would be nice if we didn't have to continue determining the domains using topology. Sound reasonable? Mark > > Mark > >> Thanks, >> -Aubrey >> >> >>> Thanks, >>> Andrei >>> >>> On Mon, Mar 30, 2009 at 7:19 AM, Mark Haywood >>> <Mark.Haywood at sun.com> wrote: >>> >>>> Li, Aubrey wrote: >>>> >>>>> Today a colleague reported a kernel panic issue to me when he is >>>>> installing Build110 to a Nehalem EP platform. It's Supermicro's >>>>> "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!) >>>>> ================================================== >>>>> $ prtdiag >>>>> System Configuration: Supermicro X8DTN >>>>> BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009 >>>>> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style) >>>>> >>>>> ==== Processor Sockets ============================== >>>>> >>>>> Version Location Tag >>>>> -------------------------------- -------------------------- >>>>> Intel(R) Xeon(R) CPU X5560 @ 2.80GH CPU 1 >>>>> Intel(R) Xeon(R) CPU X5560 @ 2.80GH CPU 2 >>>>> ================================================= >>>>> >>>>> panic stack as follows: >>>>> ========================================== >>>>> unix: cmntrap() >>>>> unix: cmt_balance + b2 <============Div ZERO >>>>> unix: setbackdq() >>>>> genunix: sleepq_wakeone_chan() >>>>> genunix: cv_signal() >>>>> genunix: delay_wakeup() >>>>> genunix: callout_list_expire() >>>>> genunix: callout_expire() >>>>> genunix: callout_execute() >>>>> genunix: taskq_thread() >>>>> unix: thread_start() >>>>> ========================================= >>>>> >>>>> After some investigation, I root caused this problem. The buggy >>>>> BIOS _PSD implementation messed the processor group >>>>> structure up, see below >>>>> ===================== >>>>> Socket 0: >>>>> cpu0~3: in domain 0 >>>>> cpu8~11: in domain 1 >>>>> >>>>> Socke 1: >>>>> cpu4~7: in domain 0 >>>>> cpu12~15: in domain 1 >>>>> ===================== >>>>> I rebuild a kernel to obtain the domain info by cpuid_get_chipid() >>>>> instead and the problem is gone. >>>>> >>>>> I remember Eric had a workaround about this issue but apparently >>>>> it doesn't cover this case, :( >>>>> >>>>> So, now I'm suggesting to remove _PSD and _CSD and _TSD related >>>>> implementation at all. We build the domain info by the cpu topology >>>>> structure instead. >>>>> Any thoughts? >>>>> >>>>> >>>> Are we running into that many problems with these objects? I'd hate >>>> to see us circumvent these ACPI objects because of rare BIOS bugs. >>>> Next thing you know we'll be building our own state (_PSS, _TSS, >>>> _CST) tables as well. >>>> I imagine that we could determine the domain structure based on >>>> topology. However, I'm uncertain about how we determine domain >>>> type? Is that straightforward? >>>> If we go this route, then we probably want to make the code that >>>> builds the domains vendor specific. Because I'm not sure that all >>>> CPU vendors will have the same algorithm for determining domains. >>>> And we we might just have the other vendors default to the _PSD. >>>> >>>> >>>>> Thanks, >>>>> -Aubrey >>>>> >>>>> >>>> _______________________________________________ >>>> tesla-dev mailing list >>>> tesla-dev at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev >>>> >> >> > > _______________________________________________ > tesla-dev mailing list > tesla-dev at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/tesla-dev
