Re: [PATCH] procfs: export context switch counts in /proc/*/stat
"Albert Cahalan" <[EMAIL PROTECTED]> writes: > The cumulative ones are still not justified though, and I fear they > may be 64-bit even on i386. All the context switch counts are unsigned long. > It turns out that an i386 procps spends > much of its time doing 64-bit division to parse the damn ASCII crap. > I suppose I could just skip those fields, but generating them isn't > too cheap and probably I'd get stuck parsing them for some other > reason -- having them separate is probably a good idea. I can't think of a compelling justification for the cumulative context switch counts. But I suggest that if the cost of exposing these values is low enough, they should be exposed anyway, just for the sake of uniformity (these would be the only two getrusage values not present in /proc/pid/stat). If the decimal representation of values in /proc/pid/stat has such unpleasant overheads, then I wonder if that is something worth fixing, whether the context switch counts are added or not? It occurs to me that it would be easy to add support for a hex version of /proc/pid/stat with very little additional code, by using an alternate sprintf format string in fs/proc/array.c:do_task_stat(). I assume that procps could be adapted quite easily to take advantage of this? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
Albert Cahalan wrote: > On 12/20/06, David Wragg <[EMAIL PROTECTED]> wrote: > > "Albert Cahalan" <[EMAIL PROTECTED]> writes: > > > On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: > > >> This patch (against 2.6.19/2.6.19.1) adds the four context > > >> switch values (voluntary context switches, involuntary > > >> context switches, and the same values accumulated from > > >> terminated child processes) to the end of /proc/*/stat, > > >> similarly to min_flt, maj_flt and the time used values. > > > > > > Hmmm, OK, do people have a use for these values? > > > > My reason for writing the patch was to track which processes are > > active (i.e. got scheduled to run) by polling these context switch > > values. The time used values are not a reliable way to detect process > > activity on fast machines. So for example, when sorting by %CPU, top > > often shows many processes using 0% CPU, despite the fact that these > > processes are running occasionally. If top sorted by (%CPU, context > > switch count delta), it might give a more useful display of which > > processes are active on the system. > > Oh, that'd be great. It may be great, but it's really only a workaround. The real fix is in changing the current probed proc-timing to an inlined one. > The cumulative ones are still not justified though, and I fear they > may be 64-bit even on i386. It turns out that an i386 procps spends > much of its time doing 64-bit division to parse the damn ASCII crap. > I suppose I could just skip those fields, but generating them isn't > too cheap and probably I'd get stuck parsing them for some other > reason -- having them separate is probably a good idea. Agreed. It may also be advisable to add a top3 line in /proc/stat, to circumvent parsing /proc/*/stat, when only checking who is eating CPU most. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
On 12/20/06, David Wragg <[EMAIL PROTECTED]> wrote: "Albert Cahalan" <[EMAIL PROTECTED]> writes: > On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: >> This patch (against 2.6.19/2.6.19.1) adds the four context >> switch values (voluntary context switches, involuntary >> context switches, and the same values accumulated from >> terminated child processes) to the end of /proc/*/stat, >> similarly to min_flt, maj_flt and the time used values. > > Hmmm, OK, do people have a use for these values? My reason for writing the patch was to track which processes are active (i.e. got scheduled to run) by polling these context switch values. The time used values are not a reliable way to detect process activity on fast machines. So for example, when sorting by %CPU, top often shows many processes using 0% CPU, despite the fact that these processes are running occasionally. If top sorted by (%CPU, context switch count delta), it might give a more useful display of which processes are active on the system. Oh, that'd be great. The cumulative ones are still not justified though, and I fear they may be 64-bit even on i386. It turns out that an i386 procps spends much of its time doing 64-bit division to parse the damn ASCII crap. I suppose I could just skip those fields, but generating them isn't too cheap and probably I'd get stuck parsing them for some other reason -- having them separate is probably a good idea. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
Arjan van de Ven <[EMAIL PROTECTED]> writes: > On Wed, 2006-12-20 at 14:38 +, David Wragg wrote: >> (When I try the script, stap complains about the lack of the kernel >> debuginfo package, which of course doesn't exist for my self-built >> kernel. After hunting around on the web for 10 minutes, I'm still no >> closer to resolving this. But I look forward to playing with >> systemtap once I get past that problem.) > > what worked for me is copying the "vmlinux" file to /boot as > /boot/vmlinux-`uname -r` Thanks, that's got it working. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
On Wed, 2006-12-20 at 14:38 +, David Wragg wrote: > Arjan van de Ven <[EMAIL PROTECTED]> writes: > > if all you care is the number of context switches, you can use the > > following system tap script as well: > > > > http://www.fenrus.org/cstop.stp > > Thanks, something similar to that might well have solved my original > problem. > > (When I try the script, stap complains about the lack of the kernel > debuginfo package, which of course doesn't exist for my self-built > kernel. After hunting around on the web for 10 minutes, I'm still no > closer to resolving this. But I look forward to playing with > systemtap once I get past that problem.) what worked for me is copying the "vmlinux" file to /boot as /boot/vmlinux-`uname -r` (strace the stap program to see what it tries to load) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
Arjan van de Ven <[EMAIL PROTECTED]> writes: > if all you care is the number of context switches, you can use the > following system tap script as well: > > http://www.fenrus.org/cstop.stp Thanks, something similar to that might well have solved my original problem. (When I try the script, stap complains about the lack of the kernel debuginfo package, which of course doesn't exist for my self-built kernel. After hunting around on the web for 10 minutes, I'm still no closer to resolving this. But I look forward to playing with systemtap once I get past that problem.) Nonetheless, while systemtap might provide an objection to adding per-task context switch counters to the kernel, it doesn't answer the question, since we do have these counters, why not expose them in the normal way? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
On Wed, 2006-12-20 at 13:20 +, David Wragg wrote: > "Albert Cahalan" <[EMAIL PROTECTED]> writes: > > On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: > >> This patch (against 2.6.19/2.6.19.1) adds the four context > >> switch values (voluntary context switches, involuntary > >> context switches, and the same values accumulated from > >> terminated child processes) to the end of /proc/*/stat, > >> similarly to min_flt, maj_flt and the time used values. > > > > Hmmm, OK, do people have a use for these values? > > My reason for writing the patch was to track which processes are > active (i.e. got scheduled to run) by polling these context switch > values. The time used values are not a reliable way to detect process > activity on fast machines. So for example, when sorting by %CPU, top > often shows many processes using 0% CPU, despite the fact that these > processes are running occasionally. If top sorted by (%CPU, context > switch count delta), it might give a more useful display of which > processes are active on the system. if all you care is the number of context switches, you can use the following system tap script as well: http://www.fenrus.org/cstop.stp -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
"Albert Cahalan" <[EMAIL PROTECTED]> writes: > On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: >> This patch (against 2.6.19/2.6.19.1) adds the four context >> switch values (voluntary context switches, involuntary >> context switches, and the same values accumulated from >> terminated child processes) to the end of /proc/*/stat, >> similarly to min_flt, maj_flt and the time used values. > > Hmmm, OK, do people have a use for these values? My reason for writing the patch was to track which processes are active (i.e. got scheduled to run) by polling these context switch values. The time used values are not a reliable way to detect process activity on fast machines. So for example, when sorting by %CPU, top often shows many processes using 0% CPU, despite the fact that these processes are running occasionally. If top sorted by (%CPU, context switch count delta), it might give a more useful display of which processes are active on the system. More generally, it seems perverse to track these context switch values but only expose them through the constrained getrusage interface. If they are worth having, why aren't they worth exposing in the same way as all other process info? > [...] >> Putting just these four values into a new file would seem a little >> odd, since they have a lot in common with the other getrusage values >> that are already in /proc/pid/stat. One possibility is to add >> /proc/pid/rusage, mirroring the full struct rusage in text form, since >> struct rusage is already part of the kernel ABI (though Linux doesn't >> fill in half of the values). > > Since we already have a struct defined and all... > > sys_get_rusage(int pid) That would be a much more useful system call than getrusage. But why have two ways of retrieving process info, /proc and a sys_get_rusage, exposing differing subsets of process information? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
David Wragg writes: Benjamin LaHaise <[EMAIL PROTECTED]> writes: On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: This patch (against 2.6.19/2.6.19.1) adds the four context switch values (voluntary context switches, involuntary context switches, and the same values accumulated from terminated child processes) to the end of /proc/*/stat, similarly to min_flt, maj_flt and the time used values. Hmmm, OK, do people have a use for these values? Please put these into new files, as the stat files in /proc are horribly overloaded and have always been somewhat problematic when it comes to changing how things are reported due to internal changes to the kernel. Cheers, No thanks. Yours truly, the maintainer of "ps", "top", "vmstat", etc. The delay accounting value was added to the end of /proc/pid/stat back in July without discussion, so I assumed this approach was still considered satisfactory. /proc/*/stat is the very best place in /proc for any per-process data that will be commonly needed. Unlike /proc/*/status, few people are tempted to screw with the formatting and/or spelling. Unlike the /sys crap, it doesn't take 3 syscalls PER VALUE to get at the data. The things to ask are of course: will this really be used, and does it really belong in /proc at all? Putting just these four values into a new file would seem a little odd, since they have a lot in common with the other getrusage values that are already in /proc/pid/stat. One possibility is to add /proc/pid/rusage, mirroring the full struct rusage in text form, since struct rusage is already part of the kernel ABI (though Linux doesn't fill in half of the values). Since we already have a struct defined and all... sys_get_rusage(int pid) Or perhaps it makes sense to reorganize all the values from /proc/pid/stat and its siblings into a sysfs-like one-value-per-file structure, though that might introduce atomicity and efficiency issues (calculating some of the values involves iterating over the threads in the process; with everything in one file, these loops are folded together). Yeah, big time. Things are quite bad in /proc, but /sys is a joke. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
Benjamin LaHaise <[EMAIL PROTECTED]> writes: > On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: >> This patch (against 2.6.19/2.6.19.1) adds the four context switch >> values (voluntary context switches, involuntary context switches, and >> the same values accumulated from terminated child processes) to the >> end of /proc/*/stat, similarly to min_flt, maj_flt and the time used >> values. > > Please put these into new files, as the stat files in /proc are > horribly overloaded and have always been somewhat problematic > when it comes to changing how things are reported due to internal > changes to the kernel. Cheers, The delay accounting value was added to the end of /proc/pid/stat back in July without discussion, so I assumed this approach was still considered satisfactory. Putting just these four values into a new file would seem a little odd, since they have a lot in common with the other getrusage values that are already in /proc/pid/stat. One possibility is to add /proc/pid/rusage, mirroring the full struct rusage in text form, since struct rusage is already part of the kernel ABI (though Linux doesn't fill in half of the values). Or perhaps it makes sense to reorganize all the values from /proc/pid/stat and its siblings into a sysfs-like one-value-per-file structure, though that might introduce atomicity and efficiency issues (calculating some of the values involves iterating over the threads in the process; with everything in one file, these loops are folded together). Any thoughts? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] procfs: export context switch counts in /proc/*/stat
On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote: > This patch (against 2.6.19/2.6.19.1) adds the four context switch > values (voluntary context switches, involuntary context switches, and > the same values accumulated from terminated child processes) to the > end of /proc/*/stat, similarly to min_flt, maj_flt and the time used > values. Please put these into new files, as the stat files in /proc are horribly overloaded and have always been somewhat problematic when it comes to changing how things are reported due to internal changes to the kernel. Cheers, -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] procfs: export context switch counts in /proc/*/stat
The kernel already maintains context switch counts for each task, and exposes them through getrusage(2). These counters can also be used more generally to track which processes on the system are active (i.e. getting scheduled to run), but getrusage is too constrained to use it in that way. This patch (against 2.6.19/2.6.19.1) adds the four context switch values (voluntary context switches, involuntary context switches, and the same values accumulated from terminated child processes) to the end of /proc/*/stat, similarly to min_flt, maj_flt and the time used values. Signed-off-by: David Wragg <[EMAIL PROTECTED]> diff -uprN --exclude='*.o' --exclude='*~' --exclude='.*' linux-2.6.19.1/fs/proc/array.c linux-2.6.19.1.build/fs/proc/array.c --- linux-2.6.19.1/fs/proc/array.c 2006-12-18 14:35:36.0 + +++ linux-2.6.19.1.build/fs/proc/array.c2006-12-18 14:43:21.0 + @@ -327,6 +327,8 @@ static int do_task_stat(struct task_stru unsigned long cmin_flt = 0, cmaj_flt = 0; unsigned long min_flt = 0, maj_flt = 0; cputime_t cutime, cstime, utime, stime; + unsigned long cnvcsw = 0, cnivcsw = 0; + unsigned long nvcsw = 0, nivcsw = 0; unsigned long rsslim = 0; char tcomm[sizeof(task->comm)]; unsigned long flags; @@ -369,6 +371,8 @@ static int do_task_stat(struct task_stru cmaj_flt = sig->cmaj_flt; cutime = sig->cutime; cstime = sig->cstime; + cnvcsw = sig->cnvcsw; + cnivcsw = sig->cnivcsw; rsslim = sig->rlim[RLIMIT_RSS].rlim_cur; /* add up live thread stats at the group level */ @@ -379,6 +383,8 @@ static int do_task_stat(struct task_stru maj_flt += t->maj_flt; utime = cputime_add(utime, t->utime); stime = cputime_add(stime, t->stime); + nvcsw += t->nvcsw; + nivcsw += t->nivcsw; t = next_thread(t); } while (t != task); @@ -386,6 +392,8 @@ static int do_task_stat(struct task_stru maj_flt += sig->maj_flt; utime = cputime_add(utime, sig->utime); stime = cputime_add(stime, sig->stime); + nvcsw += sig->nvcsw; + nivcsw += sig->nivcsw; } sid = sig->session; @@ -404,6 +412,8 @@ static int do_task_stat(struct task_stru maj_flt = task->maj_flt; utime = task->utime; stime = task->stime; + nvcsw = task->nvcsw; + nivcsw = task->nivcsw; } /* scale priority and nice values from timeslices to -20..20 */ @@ -420,7 +430,7 @@ static int do_task_stat(struct task_stru res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \ %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \ -%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n", +%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu %lu %lu %lu %lu\n", task->pid, tcomm, state, @@ -465,7 +475,12 @@ static int do_task_stat(struct task_stru task_cpu(task), task->rt_priority, task->policy, - (unsigned long long)delayacct_blkio_ticks(task)); + (unsigned long long)delayacct_blkio_ticks(task), + nvcsw, + cnvcsw, + nivcsw, + cnivcsw); + if(mm) mmput(mm); return res; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/