On Tue, Feb 12, 2008 at 01:28:21PM -0800, Pallipadi, Venkatesh wrote:
>
>
> -----Original Message-----
> From: Kevin Hilman [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 31, 2008 5:01 PM
> To: Pallipadi, Venkatesh
> Cc: [email protected]; Adam Belay
> Subject: Re: [PATCH/RFC] powertop non-ACPI: use CPUidle for C-state
> details
>
> "Pallipadi, Venkatesh" <[EMAIL PROTECTED]> writes:
>
> >>
> >>"Pallipadi, Venkatesh" <[EMAIL PROTECTED]> writes:
> >>
> >>> Nice timing for the patch. I am working on cpuidle patches that will
> >>> add things like C1 idle residency time in /sysfs and was asking
> >>> Arjan about switching powertop to use cpuidle interfaces :-).
> >>
> >>Great! My platform reports residency for all the states in sysfs
> >>already, so powertop using sysfs is very useful to me.
> >>
> >>> One major comment is that with the patch C-state timing in powertop
> >>> seems to be off, Due to different units of time in /proc/acpi and
> >>> sysfs cpuidle. /proc/acpi C-state timings are in ACPI PM timer
> >>> ticks and cpuidle exports it in micro-seconds.
> >>
> >>Yes, the percentage numbers are indeed off. I was just noticing the
> >>same problem. I'm working right now on changing those calculations to
> >>use usecs instead of ticks.
> >>
> >>I assume the hardcoded frequency in powertop (3579.545) is the
> >>frequency of the ACPI timer? I'm pretty ignorant of ACPI, but am
> >>guessing that's what those calculations are for.
> >
> > Yes. That hardcoded freq number is coming from ACPI PM timer
> frequency.
> >
>
> That being the case, the following patch makes the residency
> calculations work using either usecs for CPUidle or ACPI ticks.
> Basically, just setting FREQ=1000 make the calculations work out for
> usecs.
>
> Now I'm seeing correct residency percentages on my platform using
> CPUidle.
>
> Kevin
>
Arjan,
Below is the patch from Kevin, which I slightly modified to fix one bug on
SMP systems, wherein variable dir was getting used across 2 while loops
causing some confusions.
With this patch, we still look at /proc/acpi first and then fallback to
/sys/.../cpuidle. That order has to change sometime in future as
/proc/acpi/.../power is sort of deprecated. Also, there are patches in
upstream git/mm kernel which adds more C-state information like C1 residency
time, a C0 poll idle state and mapping to hardware C-state. That will mean
/sys/.../cpuidle has more info than /proc/acpi.
Thanks,
Venki
diff -purN powertop-1.9/powertop.c powertop-1.9-mod/powertop.c
--- powertop-1.9/powertop.c 2007-10-29 10:45:56.000000000 -0700
+++ powertop-1.9-mod/powertop.c 2007-10-18 04:57:23.000000000 -0700
@@ -35,6 +35,7 @@
#include <assert.h>
#include <locale.h>
#include <time.h>
+#include <sys/stat.h>
#include "powertop.h"
@@ -62,7 +63,8 @@ struct irqdata {
struct irqdata interrupts[IRQCOUNT];
-#define FREQ 3579.545
+#define FREQ_ACPI 3579.545
+static unsigned long FREQ;
int nostats;
@@ -236,7 +238,7 @@ static void do_proc_irq(void)
fclose(file);
}
-static void read_data(uint64_t * usage, uint64_t * duration)
+static void read_data_acpi(uint64_t * usage, uint64_t * duration)
{
DIR *dir;
struct dirent *entry;
@@ -286,6 +288,108 @@ static void read_data(uint64_t * usage,
closedir(dir);
}
+static void read_data_cpuidle(uint64_t * usage, uint64_t * duration)
+{
+ DIR *cpudir;
+ DIR *dir;
+ struct dirent *entry;
+ FILE *file = NULL;
+ char line[4096];
+ char filename[128], *f;
+ int len, clevel = 0;
+ unsigned long usecs, ticks_per_sec = 32768, usecs_per_sec = 1000000;
+
+ memset(usage, 0, 64);
+ memset(duration, 0, 64);
+
+ cpudir = opendir("/sys/devices/system/cpu");
+ if (!cpudir)
+ return;
+
+ /* Loop over cpuN entries */
+ while ((entry = readdir(cpudir))) {
+ if (strlen(entry->d_name) < 3)
+ continue;
+
+ if (!isdigit(entry->d_name[3]))
+ continue;
+
+ len = sprintf(filename, "/sys/devices/system/cpu/%s/cpuidle",
+ entry->d_name);
+
+ dir = opendir(filename);
+ if (!dir)
+ return;
+
+ clevel = 0;
+
+ /* For each C-state, there is a stateX directory which
+ * contains a 'usage' and a 'time' (duration) file */
+ while ((entry = readdir(dir))) {
+ if (strlen(entry->d_name) < 3)
+ continue;
+ sprintf(filename + len, "/%s/usage", entry->d_name);
+ file = fopen(filename, "r");
+ if (!file)
+ continue;
+
+ memset(line, 0, 4096);
+ f = fgets(line, 4096, file);
+ fclose(file);
+ if (f == NULL)
+ break;
+
+ usage[clevel] += 1+strtoull(line, NULL, 10);
+
+ sprintf(filename + len, "/%s/time", entry->d_name);
+ file = fopen(filename, "r");
+ if (!file)
+ continue;
+
+ memset(line, 0, 4096);
+ f = fgets(line, 4096, file);
+ fclose(file);
+ if (f == NULL)
+ break;
+
+ duration[clevel] += 1+strtoull(line, NULL, 10);
+
+ clevel++;
+ if (clevel > maxcstate)
+ maxcstate = clevel;
+
+ }
+ closedir(dir);
+
+ }
+ closedir(cpudir);
+}
+
+static void read_data(uint64_t * usage, uint64_t * duration)
+{
+ int r;
+ struct stat s;
+
+ /* First, check for ACPI */
+ r = stat("/proc/acpi/processor", &s);
+ if (!r) {
+ read_data_acpi(usage, duration);
+
+ /* perform residency calculations based on ACPI timer */
+ FREQ = FREQ_ACPI;
+ return;
+ }
+
+ /* Then check for CPUidle */
+ r = stat("/sys/devices/system/cpu/cpuidle", &s);
+ if (!r) {
+ read_data_cpuidle(usage, duration);
+
+ /* perform residency calculations based on usecs */
+ FREQ = 1000;
+ }
+}
+
void stop_timerstats(void)
{
FILE *file;
@@ -530,7 +634,7 @@ int main(int argc, char **argv)
memset(&cstate_lines, 0, sizeof(cstate_lines));
topcstate = -4;
if (totalevents == 0 && maxcstate <= 1) {
- sprintf(cstate_lines[5],_("< Detailed C-state
information is only available on Mobile CPUs (laptops) >\n"));
+ sprintf(cstate_lines[5],_("< Detailed C-state
information is not available.>\n"));
} else {
double sleept, percentage;;
c0 = sysconf(_SC_NPROCESSORS_ONLN) * ticktime * 1000 *
FREQ - totalticks;
_______________________________________________
Power mailing list
[email protected]
http://www.bughost.org/mailman/listinfo/power