Li, Aubrey wrote:
> Robert.Johnston at Sun.COM wrote:
>
>   
>> Bill Holler wrote:
>>     
>>> I cc'ed Rob Johnson who works on FMA"fault management" to see if he
>>> knows of a cpu/fan temperature and status mechanism in Solaris.
>>>
>>> SPARC based servers monitor/control fans through an embedded system
>>> controller without visibility from Solaris.
>>>
>>> Bill
>>>
>>> Li, Aubrey wrote:
>>>       
>>>> Hi,
>>>>
>>>> Is there any existing tools or interface on the solaris can monitor
>>>> CPU temperature and control fan status?
>>>>         
>> Unfortunatly, it varies from platform to platform.  On your
>> typical x86
>> workstation, the sensor IC's that monitor things like ambient
>> temperature or fan speed sit on the i2c or sm bus behind the
>> southbridge.  There are smbus and i2c nexus and slave device
>> drivers in
>> Solaris, but they are currently sparc-only.  As Casper mentioned on
>> another list, I believe some of the sensor data may be
>> available in ACPI
>> space.
>>
>> The story is better on our x86 server platforms.  Those all have
>> baseboard management controllers which hide the i2c stuff and
>> provide a
>> somewhat standardized higher-level interface (IPMI) for
>> obtaining sensor
>> readings.  Solaris bundles a CLI (ipmitool) which you can use
>> for that.
>>  If you not looking for a programming or scriptable interface you can
>> also connect to the ILOM web interface and monitor sensor and
>> indicator states there.
>>
>> We are also looking into extending libipmi and integrating it with
>> libtopo to provide a more generalized programming interface for
>> discovering and manipulating sensors (as part of the FMA sensor
>> architecture - http://www.opensolaris.org/os/project/sensors/) but
>> that's still a little ways off.
>>
>> I'm not terribly familiar with what's available on SPARC
>> platforms, but
>> I'm guessing that since you're from Intel that not what you're
>> interested in :). 
>>
>>     
> OK, thanks for the great explanation. So I believe the problem is
> serious
> on some mobile platform machines, at least I have one. The fact is,
> when leave from BIOS and GRUB to solaris OS, the CPU fan is stopped
> and never work, even if CPU become more and more hot.(I need to figure
> out if somewhere of the hardware is broken of my box). So as Bill
> pointed
> out, I was told "press any key to reboot" at this time.
>
> Bill.Holler wrote:
>   
>> It would really help debug these situations if Solaris had a CPU
>>     
> temperature
>   
>> and fan monitor with good reporting capability....
>>     
>
> Thanks,
> -Aubrey
>   
We have been in this situation where a (non-SUN) machine shut off due
to thermal overheating.  :-(   Having thermal monitoring capability would
have been very valuable to see what was overheating.

Having good CPU temperature monitoring and fan control in Solaris
could also be used to minimize cpus entering thermal throttle states.

The temperature of a device lags the power consumption.  Solaris knows
when a cpu is highly utilized before the temperature increases.  We could
experiment with having Solaris increase the fans due to scheduling data
*before* the temperature sensors show excessive temperature.  :-)

Regards,
Bill



Reply via email to