Mark Haywood wrote:
> Mark Haywood wrote:
>> Li, Aubrey wrote:
>>> Hi Andrei,
>>>
>>> Andrei Dorofeev wrote:
>>>
>>>  
>>>> Aubrey,
>>>>
>>>> What version of the X8DTN BIOS are you using?  I have filed
>>>> this issue back in February on premier.intel.com and it should've been
>>>> resolved by now.   Do you see this with BIOS version 3059W?
>>>>     
>>>
>>> Right, right, the version is 3059W! And this is the newest version.
>>> From the BIOS changelog, they tested windows, redhat linux, I'm not 
>>> sure
>>> if they care about what solaris reported, :(
>>>
>>>  
>>>> Don't give up on using _PSD because of this.
>>>>     
>>>
>>> I believe before NHM, _PSD is seldom implemented. Using the vendor
>>> specific CPU topology should be an acceptable way to build domain info.
>>> I admit I don't have the knowledge about the other CPU vendor, like 
>>> SPARC
>>> and AMD, what's the benefit of using _PSD? Othering than introducing 
>>> panic, ;)
>>>   
>>
>> That's a very good question. What is the benefit of the _PSD? I 
>> believe the _PSD was introduced to prevent Solaris and other 
>> operating systems from having to do exactly what you are proposing 
>> (i.e., introducing vendor specific details of processor state domains 
>> into the operating system). The _PSD is supposed to define a standard 
>> way for operating systems to digest the domain data. Unless there is 
>> a really compelling reason to ignore the _PSD, I would suggest that 
>> we continue to use it.
>
> How about a compromise. Since the _PSD often doesn't exist, we have to 
> be able to determine the domains from the topology. Since we have to 
> do this anyway, why don't we digest the _PSD and verify it afterwards 
> using the topology. Why do this? So that we can report bad _PSDs to 
> the BIOS developers so that the _PSDs become more reliable in the 
> future. If a faulty _PSD is identified we can log a message to the 
> console and use domains built from the topology. At some point it 
> would be nice if we didn't have to continue determining the domains 
> using topology. Sound reasonable?
Sounds good to me.

Anup
>
> Mark
>
>>
>> Mark
>>
>>> Thanks,
>>> -Aubrey
>>>
>>>  
>>>> Thanks,
>>>> Andrei
>>>>
>>>> On Mon, Mar 30, 2009 at 7:19 AM, Mark Haywood
>>>> <Mark.Haywood at sun.com> wrote:
>>>>   
>>>>> Li, Aubrey wrote:
>>>>>     
>>>>>> Today a colleague reported a kernel panic issue to me when he is
>>>>>> installing Build110 to a Nehalem EP platform. It's Supermicro's
>>>>>> "Tylersburg EP" box with the newest BIOS.(Buggy BIOS!!!)
>>>>>> ==================================================
>>>>>> $ prtdiag
>>>>>> System Configuration: Supermicro X8DTN
>>>>>> BIOS Configuration: American Megatrends Inc. 4.6.3 03/05/2009
>>>>>> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)
>>>>>>
>>>>>> ==== Processor Sockets ==============================
>>>>>>
>>>>>> Version                          Location Tag
>>>>>> -------------------------------- --------------------------
>>>>>> Intel(R) Xeon(R) CPU           X5560  @ 2.80GH CPU 1
>>>>>> Intel(R) Xeon(R) CPU           X5560  @ 2.80GH CPU 2
>>>>>> =================================================
>>>>>>
>>>>>> panic stack as follows:
>>>>>> ==========================================
>>>>>> unix: cmntrap()
>>>>>> unix: cmt_balance + b2  <============Div ZERO
>>>>>> unix: setbackdq()
>>>>>> genunix: sleepq_wakeone_chan()
>>>>>> genunix: cv_signal()
>>>>>> genunix: delay_wakeup()
>>>>>> genunix: callout_list_expire()
>>>>>> genunix: callout_expire()
>>>>>> genunix: callout_execute()
>>>>>> genunix: taskq_thread()
>>>>>> unix: thread_start()
>>>>>> =========================================
>>>>>>
>>>>>> After some investigation, I root caused this problem. The buggy
>>>>>> BIOS _PSD implementation messed the processor group
>>>>>> structure up, see below
>>>>>> =====================
>>>>>> Socket 0:
>>>>>> cpu0~3: in domain 0
>>>>>> cpu8~11:        in domain 1
>>>>>>
>>>>>> Socke 1:
>>>>>> cpu4~7: in domain 0
>>>>>> cpu12~15:       in domain 1
>>>>>> =====================
>>>>>> I rebuild a kernel to obtain the domain info by cpuid_get_chipid()
>>>>>> instead and the problem is gone.
>>>>>>
>>>>>> I remember Eric had a workaround about this issue but apparently
>>>>>> it doesn't cover this case, :(
>>>>>>
>>>>>> So, now I'm suggesting to remove _PSD and _CSD and _TSD related
>>>>>> implementation at all. We build the domain info by the cpu topology
>>>>>> structure instead.
>>>>>> Any thoughts?
>>>>>>
>>>>>>         
>>>>> Are we running into that many problems with these objects? I'd hate
>>>>> to see us circumvent these ACPI objects because of rare BIOS bugs.
>>>>> Next thing you know we'll be building our own state (_PSS, _TSS,
>>>>> _CST) tables as well.
>>>>> I imagine that we could determine the domain structure based on
>>>>> topology.  However, I'm uncertain about how we determine domain
>>>>> type? Is that straightforward?
>>>>> If we go this route, then we probably want to make the code that
>>>>> builds the domains vendor specific. Because I'm not sure that all
>>>>> CPU vendors will have the same algorithm for determining domains.
>>>>> And we we might just have the other vendors default to the _PSD.
>>>>>
>>>>>     
>>>>>> Thanks,
>>>>>> -Aubrey
>>>>>>
>>>>>>         
>>>>> _______________________________________________
>>>>> tesla-dev mailing list
>>>>> tesla-dev at opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
>>>>>       
>>>
>>>   
>>
>> _______________________________________________
>> tesla-dev mailing list
>> tesla-dev at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
>
> _______________________________________________
> tesla-dev mailing list
> tesla-dev at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/tesla-dev


Reply via email to