Quick question...since it looks like additional dtrace probe(s) will be 
added...will it be possible to do "versioning" of powertop such that it 
does the write things if the new probes are or are not available? This 
is especially important since powertop is unbundled.

Darrin

Mark Haywood wrote:
> Li, Aubrey wrote:
>> Mark.Haywood at Sun.COM wrote:
>>
>>   
>>> Li, Aubrey wrote:
>>>     
>>>> Mark.Haywood wrote:
>>>>
>>>>
>>>>       
>>>>> Li, Aubrey wrote:
>>>>>
>>>>>         
>>>>>> Mark.Haywood wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>>>>> Li, Aubrey wrote:
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>>>> Firstly, many thanks to report this bug, :-)
>>>>>>>>
>>>>>>>> Before we setup powertop bug track system on
>>>>>>>> defect.opensolaris.org, can we report bug to the mailing list
>>>>>>>> first? 
>>>>>>>>
>>>>>>>> my comments about this bug:
>>>>>>>>
>>>>>>>> This is a very good raise-up, really. But I think it'd better to
>>>>>>>> place this bug into Todo list. Currently kernel doesn't support
>>>>>>>> any mechanism to obtain the frequency in the "turbo mode". So,
>>>>>>>> powertop can't report any related info. Actually hardware
>>>>>>>> feedback machanism exists on the processor, We need to enable it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>               
>>>>>>> It is true that the APERF/MPERF hardware feedback mechanism
>>>>>>> can help to
>>>>>>> identify that a processor has been in "turbo mode". But unless I'm
>>>>>>> mistaken there is no guarantee that when the processor is in Turbo
>>>>>>> Mode that APERF/MPERF will catch it. Among other things, it would
>>>>>>> depend upon your polling interval - which currently is pretty
>>>>>>> long. 
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> Can't we assume the processor is in turbo mode when we are in
>>>>>> P0 =  (market frequency) + 1Mhz?
>>>>>>
>>>>>>
>>>>>>           
>>>>> I don't think that the processor being at P0 guarantees that it is
>>>>> in turbo mode. It means that the processor should operate no lower
>>>>> than market frequency and if the circumstances present themselves,
>>>>> then the hardware might overclock the processor for some amount of
>>>>> time. 
>>>>>
>>>>>
>>>>>         
>>>> When kernel boot, we know wether the platform support turbo mode or
>>>> not. I happened to have a box support turbo mode.
>>>>
>>>> when turbo mode is enabled, supported frequency is
>>>> 800Mhz:1600Mhz:2400Mhz:2401Mhz,
>>>>
>>>> and when turbo mode is disabled, supported frequency is
>>>> 800Mhz:1600Mhz:2400Mhz. 
>>>>
>>>> So, in turbo mode should be
>>>> 1). turbo mode is supported
>>>> 2). processor is in P0
>>>> 3). P0 > marked frequency.
>>>>
>>>>       
>>> Sorry, I don't understand what you mean above unless you are trying to
>>> say that we know that we are in turbo mode when the frequency (which I
>>> assume we'd try to figure out using APERF/MPERF) would be greater than
>>> the P1 reported frequency? But again, my point was that unless you
>>> happen to poll during the right time, the processor might have been in
>>> turbo mode and left it and you didn't catch it ... unless you
>>> are going
>>> to special case entering and leaving P0?
>>>     
>> not poll in the kernel, let's do this by dtrace probe and check in the
>> powertop.
>>   
> Sure. I understand that we'd be adding a dtrace probe. The polling I'm 
> referring to is the powertop reporting interval. Though to be honest I 
> haven't taken a close look at the powertop implementation. I've been 
> assuming that it reports the statistics over a time interval?
>>   
>>>>>>> What exactly would powertop report about 'turbo mode'? Amount of
>>>>>>> time spent in "turbo mode"? I think a metric like that is bound to
>>>>>>> be incorrect given the current hardware support isn't it?
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> If I understand correctly, the bug reporter want to know the
>>>>>> average frequency in turbo mode(P0) in a sampling period, if
>>>>>> hardware support it. 
>>>>>>
>>>>>>
>>>>>>           
>>>>> This doesn't make much sense to me. A processor is likely (though
>>>>> maybe not with the current Solaris implementation), to switch in
>>>>> and out of P0 many times during a (powertop defined?) polling
>>>>> period. 
>>>>>
>>>>>
>>>>>         
>>>> It's doable I think. we can calculate the APERF difference and MPERF
>>>> difference between in and out P0. then we will have a ratio by the
>>>> two difference. so we can know the average frequency
>>>>       
>>> You mean that we'd modify the P-state handling logic to special case
>>> entering and leaving P0 to keep some running tally? To what end? Let's
>>> say that during the polling period we enter and leave P0 5 times and
>>> therefore compute the APERF/MPERF ratio 5 times What do we do
>>> at the end
>>> of the polling period? Do we sum those ratios and divide by 5?
>>> What good
>>> is this information? It seems to me that we are really
>>> stretching things
>>> to try to justify providing some turbo mode statistic.
>>>     
>> Again, let's do this by the dtrace probe.
>>   
> 
> Sure, I understand that it would require a dtrace probe.
>>   
>>> BTW, I assume that a processor at P0 can go into C1E and so the
>>> APERF/MPERF will not really give an accurate frequency reading of the
>>> processor while at C0? 
>>>
>>> Mark
>>>     
>> hmm, this is a problem. Let me check if we can exclude C1 by dtrace
>> probe.
>>   
> 
> Yes. It's the problem I've had with Intel's suggestion of using 
> APERF/MPERF to determine when the processor has been in turbo mode. I 
> don't think it's as simple as just computing APERF and MPERF numbers 
> over a given period and looking at the ratio. Intel added the turbo mode 
> support without providing observability support. Either that, or I'm 
> still misunderstanding something (quite possible ;-)).
> 
>> in all, this is a bug/feature request reported, for me it's worth to
>> support.
>> What do you think? :)
>>   
> 
> Well, I never saw the bug/feature request and am still not certain what 
> the submitter is asking for. What would this statistic convey and of 
> what use would it be. I'd rather not add dtrace probes, complexity to P0 
> and C1 handling and a turbo mode statistic to powertop without the 
> statistic being meaningful.
> 
> Mark
> 
>> Thanks,
>> -Aubrey
>>   
> 
> _______________________________________________
> tesla-dev mailing list
> tesla-dev at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/tesla-dev

-- 
Darrin P. Johnson
Sr. Manager, SW Engineering
Solaris Core Kernel Group
Blog: http://blogs.sun.com/darrin
Email:  darrin.johnson at sun.com
Direct: (650) 786-2395
Cell:   (650) 796-1731

Reply via email to