>> Li, Aubrey wrote:
>>> Rafael.Vanoni wrote:
>>>
>>>> Li, Aubrey wrote:
>>>>> Rafael Vanoni wrote:
>>>>>
>>>>>> Aubrey Li wrote:
>>>>>>> Here is a bug and fix from J?rgen.
>>>>>>>
>>>>>>> J?rgen Keil <> wrote:
>>>>>>>> It's a bug in powertop. The dtrace code that powertop is using
>>>>>>>> aggregates on a processorid_t (cpu) and an uint32_t (cpuspeed).
>>>>>>>>
>>>>>>>> But in function pt_cpufreq_dtrace_walk() the cpu speed is
>>>>>>>> read as uint64_t - this is working with bogus values for cpu
>>>>>>>> speed!
>>>>>>>>
>>>>>> Thanks for picking this up, J?rgen.
>>>>>>
>>>>>> After applying your patch, the p-state numbers would display
>>>>>> negative values when the sampling interval was shorter than the
>>>>>> interval between firings of the cpu-change-speed probe. I added a
>>>>>> check in pt_cpufreq_stat_account() to fix that. I'm including
>>>>>> the diff below.
>>>>>>
>>>>>> I'm still seeing the original issue when observing CPUs other
>>>>>> than CPU
>>>>>> 0. The --cpu option has been on the project gate for a while,
>>>>>> but was only putback last Friday.
>>>>>>
>>>>> When walk the aggregation of the cpufreq,
>>>>> This implementation looks wrong for the specific CPU.
>>>>>
>>>>> for (i = 0; i < g_ncpus; i++) {
>>>>> /* LINTED - alignment */
>>>>> dt_state_time += *((hrtime_t
>>>>> *)(data->dtada_percpu[i])); }
>>>>>
Here is a suggested patch:
changelog:
1) speed cast back to be (uint64_t)
2) make sure duration is not negative.
3) I failed to understand why the current walk function need to go through
all CPU to get the dtrace time everytime to fill a per-cpu
array(g_cpu_power_states).
That might be the reason why duration < 0. So change to count the specific cpu
only.
4) if we have the condition check (dt_state_time > cpu_pow->time_accounted), we
also
need to check the opposite case. time_accounted should be reset if it's not
used by this
time.
If the following patch makes sense, we'll have to fix the similar cstate issue.
Thanks,
-Aubrey
===========================================================
diff -r a64cc7e5e86d usr/src/cmd/powertop/common/cpufreq.c
--- a/usr/src/cmd/powertop/common/cpufreq.c Mon Apr 13 23:01:54 2009 -0700
+++ b/usr/src/cmd/powertop/common/cpufreq.c Tue Apr 14 01:02:03 2009 -0700
@@ -76,7 +76,7 @@
"/last[(processorid_t)arg0] != 0/"
"{"
" this->cpu = (processorid_t)arg0;"
-" this->oldspeed = (uint32_t)(arg1/1000000);"
+" this->oldspeed = (uint64_t)(arg1/1000000);"
" @times[this->cpu, this->oldspeed] = sum(timestamp - last[this->cpu]);"
" last[this->cpu] = timestamp;"
"}"
@@ -84,7 +84,7 @@
"/last[(processorid_t)arg0] == 0/"
"{"
" this->cpu = (processorid_t)arg0;"
-" this->oldspeed = (uint32_t)(arg1/1000000);"
+" this->oldspeed = (uint64_t)(arg1/1000000);"
" @times[this->cpu, this->oldspeed] = sum(timestamp - begin);"
" last[this->cpu] = timestamp;"
"}";
@@ -105,7 +105,7 @@
" last != 0/"
"{"
" this->cpu = (processorid_t)arg0;"
-" this->oldspeed = (uint32_t)(arg1/1000000);"
+" this->oldspeed = (uint64_t)(arg1/1000000);"
" @times[this->cpu, this->oldspeed] = sum(timestamp - last);"
" last = timestamp;"
"}"
@@ -114,7 +114,7 @@
" last == 0/"
"{"
" this->cpu = (processorid_t)arg0;"
-" this->oldspeed = (uint32_t)(arg1/1000000);"
+" this->oldspeed = (uint64_t)(arg1/1000000);"
" @times[this->cpu, this->oldspeed] = sum(timestamp - begin);"
" last = timestamp;"
"}";
@@ -347,6 +347,8 @@
speed = cpu_pow->current_pstate;
duration = (hrtime_t)((interval * NANOSEC)) - cpu_pow->dtrace_time;
+ if (duration < 0)
+ return;
for (i = 0; i < g_npstates; i++) {
if (g_pstate_info[i].speed == speed) {
@@ -439,11 +441,6 @@
cpu_rec = &aggdesc->dtagd_rec[1];
speed_rec = &aggdesc->dtagd_rec[2];
- for (i = 0; i < g_ncpus; i++) {
- /* LINTED - alignment */
- dt_state_time += *((hrtime_t *)(data->dtada_percpu[i]));
- }
-
/* LINTED - alignment */
cpu = *(int32_t *)(data->dtada_data + cpu_rec->dtrd_offset);
/* LINTED - alignment */
@@ -454,6 +451,9 @@
speed = max_cpufreq;
}
+ /* LINTED - alignment */
+ dt_state_time += *((hrtime_t *)(data->dtada_percpu[cpu]));
+
/*
* We have an aggregation record for "cpu" being at "speed"
* for an interval of "n" nanoseconds. The reported interval
@@ -476,6 +476,9 @@
dt_state_time -=
cpu_pow->time_accounted;
cpu_pow->time_accounted = 0;
+ } else {
+ dt_state_time = 0;
+ cpu_pow->time_accounted = 0;
}
}
g_pstate_info[i].total_time += dt_state_time;
================================================================