[tesla-dev] bogus values for cpu speed

Rafael Vanoni Fri, 08 May 2009 16:41:33 -0700

Li, Aubrey wrote:
> Excellent catch of 
>> (a) the code (pt_cpufreq_dtrace_walk() and
>> pt_cpufreq_dtrace_account()) assume that the value stored in
>> time_accounted during the last interval belongs to the first
>> aggregate entry we walk when collecting data for the current
>> interval.
> 
> Due to the issue (a), (b) should be considered as well.
>> (b) we should *not* return if the dt_state_time is zero. We need to
>> continue down walk() and do the accounting.
> 
> The proposal looks pretty good and much cleaner. Some concerns below:


Thanks ;) I like it a lot too. Much cleaner.

> 1) My first time to see the aggregration in the profile-second action, Is 
> this 
> the reason that profile action is called [number of CPU] times?

The profile provider fires on every CPU, so it clears the value of the 
array for all entries. Does that answer your question ?

> 2) The variables "last" and "curr" need to be initialized, in case 
> ":::cpu-change-speed"
> will not be fired at all, this is very common under p-state poll-mode. 
> Otherwise the 
> aggregation could be invalid.

Yeah, I'm still writing the patch. Might be able to handle that with 
kstats alone.

> 3) Do we need to sync up the dtrace time with the powertop sampling period? 
> Although
> I don't see any problem so far.

Yes, otherwise we'll collect that for a shorter interval than we want. 
For instance, if the profile probe fires before we walk the aggregation, 
it will clear it and we'll have bad data.

I'm hoping to post a webrev by then end of the day.

Thanks,
Rafael

> Thanks,
> -Aubrey
> 
> Rafael.Vanoni wrote:
> 
>> Alright, I think I got it.
>>
>> I propose changing the cpufreq.c DTrace script for this one,
>> that fires
>> a profile-Ns probe at every interval and does all the accounting in a
>> much simpler way. 
>>
>> Running this one on the command line with a 3 second interval
>> I get the
>> following values:
>>
>> cpu   1 state  1330 res 531
>> cpu   0 state  1330 res 532
>> cpu   0 state   800 res 2448
>> cpu   1 state   800 res 2449
>>
>> cpu   0 state  1330 res 823
>> cpu   1 state  1330 res 823
>> cpu   0 state   800 res 2189
>> cpu   1 state   800 res 2189
>>
>> cpu   1 state  1330 res 1221
>> cpu   0 state  1330 res 1222
>> cpu   0 state   800 res 1724
>> cpu   1 state   800 res 1726
>>
>> The jitter is very small, and the solution is much cleaner than the
>> current one. 
>>
>> Rafael
>>
>>
>>
>> Rafael Vanoni wrote:
>>> Hey guys
>>>
>>> Kuriakose raised some good questions on a 'post code review'
>>> conversation. There are two, somewhat related problems :
>>>
>>> (a) the code (pt_cpufreq_dtrace_walk() and
>>> pt_cpufreq_dtrace_account()) assume that the value stored in
>>> time_accounted during the last interval belongs to the first
>>> aggregate entry we walk when collecting data for the current
>>> interval.  
>>>
>>> (b) we should *not* return if the dt_state_time is zero. We need to
>>> continue down walk() and do the accounting.
>>>
>>> Consider the following transitions:
>>>
>>>      t-2                      t-1                      t
>>>       |------------------------|-----------------------|
>>> 900   |              ----------|--------\              |
>>> 800   |         ----/          |         ----\         |
>>> 700   |--------/               |              ---------|
>>>
>>> which generate the following aggregations (speed, residency):
>>>
>>> 700 2     900 4
>>> 800 1     800 1
>>>
>>> We have (because of (b)):
>>>
>>>                    walk()     account() |    walk()     account()
>>>                  700    800      900    |  700    800      900
>>> dt_state_time     2      1        -     |   0      1
>>> total_time        2      1        2     |   -      0*
>>> dtrace_time       2      3        2     |   -      0*
>>> time_accounted    0      0        2     |   -      2
>>>
>>> But prior to (b) we still had
>>>
>>>                    walk()     account() |    walk()     account()
>>>                  700    800      900    |  700    800      900
>>> dt_state_time     2      1        -     |   0
>>> total_time        2      1        2     |   0      ...
>>> dtrace_time       2      3        3     |   0
>>> time_accounted    0      0        2     |   0**
>>>
>>> ** cpuidle.c:525
>>>
>>> So we're expecting the aggregation to be ordered descendingly by the
>>> residency at each state, which is not true. It's ordered by its keys.
>>>
>>> I'm working on a fix for this now.
>>>
>>> Thanks,
>>> Rafael
>>>
>>>
>>> _______________________________________________
>>> tesla-dev mailing list
>>> tesla-dev at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
>

[tesla-dev] bogus values for cpu speed

Reply via email to