Re: more on scheduler benchmarks
Joe deBlaquiere wrote: > > Maybe I've been off in the hardware lab for too long, but how about > > 1. using ioperm to give access to the parallel port. > 2. have your program write a byte (thread id % 256 ?) constantly to the > port during it's other activity > 3. capture the results from another computer with an ecp port > > This way you don't run the risk of altering the scheduler behavior with > your logging procedure. It's a technique I've used in debugging realtime systems. It works great, but bear in mind that the out to the parallel port costs an awful lot of cycles. You *will* alter the behaviour of the scheduler. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more on scheduler benchmarks
Joe deBlaquiere wrote: Maybe I've been off in the hardware lab for too long, but how about 1. using ioperm to give access to the parallel port. 2. have your program write a byte (thread id % 256 ?) constantly to the port during it's other activity 3. capture the results from another computer with an ecp port This way you don't run the risk of altering the scheduler behavior with your logging procedure. It's a technique I've used in debugging realtime systems. It works great, but bear in mind that the out to the parallel port costs an awful lot of cycles. You *will* alter the behaviour of the scheduler. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more on scheduler benchmarks
Bill Hartner wrote: > > Hubertus wrote : > > > The only problem I have with sched_yield like benchmarks is that it > creates > > artificial lock contention as we basically spent most of the time other > > then context switching + syscall under the scheduler lock. This we won't > > see in real apps, that's why I think the chatroom numbers are probably > > better indicators. > > Agreed. 100% artificial. The intention of the benchmark is to put a lot > of pressure on the scheduler so that the benchmark results will be very > "sensitive" to changes in schedule(). One approach would be to directly measure the time taken by schedule() using the pentium timestamp counter. You can do this with the point-to-point timing patch which I did last year (recently refreshed). It's at http://www.uow.edu.au/~andrewm/linux/#timepegs Applying timepegs, plus schedule-timer.patch (attached) reveals that vanilla schedule() takes 32 microseconds with 100 tasks on the runqueue, and 4 usecs with an empty runqueue. timepegs are probably a bit heavyweight for this. Their cache footprint perhaps introduces some heisenberg effects. Although, given that you're only looking for deltas, this won't matter a lot. [ hack, hack, hack ] OK, schedule-hack-timer.patch open codes the measurement of schedule(). Just thump on ALT-SYSRQ-Q and multiply by your CPU clock period to get the statistics. Booting with `noapic' and ignoring CPU0's results may make things more repeatable... This patch gives similar figures to the timepeg approach. Running 'bwait' (also attached) to populate the runqueue I see schedule() taking the following amount of time: runqueue length microseconds (500MHz PII) 2 5 4 6 6 6 8 6 16 7.5 24 11 32 15 48 20 64 25 128 44 Seems surprisingly slow? - #include #include main(int argc, char *argv[]) { int n = (argc > 1) ? atoi(argv[1]) : 20; int i; for (i = 0; i < n - 1; i++) { if (fork() == 0) { sleep(1); for ( ; ; ) ; } } printf("created %d busywaiters\n", n); for ( ; ; ) ; } --- linux-2.4.1-pre10/kernel/sched.cTue Jan 23 19:28:16 2001 +++ linux-akpm/kernel/sched.c Tue Jan 23 23:18:21 2001 @@ -33,6 +33,13 @@ extern void tqueue_bh(void); extern void immediate_bh(void); +#include +static struct { + unsigned long acc_time; + unsigned long n_times; + unsigned long long in; +} cacheline_aligned schedule_stats[NR_CPUS + 1]; + /* * scheduler variables */ @@ -505,7 +512,7 @@ * tasks can run. It can not be killed, and it cannot sleep. The 'state' * information in task[0] is never used. */ -asmlinkage void schedule(void) +static void __schedule(void) { struct schedule_data * sched_data; struct task_struct *prev, *next, *p; @@ -688,6 +695,88 @@ BUG(); return; } + +// + +static unsigned long long dummy; +static unsigned long long calib; +static int done_calib; + +static void do_one(void) +{ + rdtscll(dummy); +} + +static void calibrate(void) +{ + unsigned long long in, out; + unsigned long flags; + int i; + + local_irq_save(flags); + rdtscll(in); + for (i = 0; i < 0x1; i++) { + do_one(); + } + rdtscll(out); + local_irq_restore(flags); + calib = (out - in) >> 16; + done_calib = 1; +} + +asmlinkage void schedule(void) +{ + int cpu = smp_processor_id(); + unsigned long long out; + + if (!done_calib) + calibrate(); + + rdtscll(schedule_stats[cpu].in); + __schedule(); + rdtscll(out); + + schedule_stats[cpu].acc_time += out - schedule_stats[cpu].in - calib; + schedule_stats[cpu].n_times++; +} + +static atomic_t cpu_count; + +static void ss_dumper(void *dummy) +{ + int cpu = smp_processor_id(); + while (atomic_read(_count) != cpu) + ; + printk("CPU %d: %lu / %lu = %lu cycles/switch\n", + cpu, schedule_stats[cpu].acc_time, schedule_stats[cpu].n_times, + schedule_stats[cpu].acc_time / schedule_stats[cpu].n_times); + + schedule_stats[NR_CPUS].acc_time += schedule_stats[cpu].acc_time; + schedule_stats[NR_CPUS].n_times += schedule_stats[cpu].n_times; + + schedule_stats[cpu].acc_time = 0; + schedule_stats[cpu].n_times = 0; + atomic_inc(_count); + if (atomic_read(_count) == smp_num_cpus) { + printk("total: %lu / %lu = %lu cycles/switch\n", +
Re: more on scheduler benchmarks
Bill Hartner wrote: Hubertus wrote : The only problem I have with sched_yield like benchmarks is that it creates artificial lock contention as we basically spent most of the time other then context switching + syscall under the scheduler lock. This we won't see in real apps, that's why I think the chatroom numbers are probably better indicators. Agreed. 100% artificial. The intention of the benchmark is to put a lot of pressure on the scheduler so that the benchmark results will be very "sensitive" to changes in schedule(). One approach would be to directly measure the time taken by schedule() using the pentium timestamp counter. You can do this with the point-to-point timing patch which I did last year (recently refreshed). It's at http://www.uow.edu.au/~andrewm/linux/#timepegs Applying timepegs, plus schedule-timer.patch (attached) reveals that vanilla schedule() takes 32 microseconds with 100 tasks on the runqueue, and 4 usecs with an empty runqueue. timepegs are probably a bit heavyweight for this. Their cache footprint perhaps introduces some heisenberg effects. Although, given that you're only looking for deltas, this won't matter a lot. [ hack, hack, hack ] OK, schedule-hack-timer.patch open codes the measurement of schedule(). Just thump on ALT-SYSRQ-Q and multiply by your CPU clock period to get the statistics. Booting with `noapic' and ignoring CPU0's results may make things more repeatable... This patch gives similar figures to the timepeg approach. Running 'bwait' (also attached) to populate the runqueue I see schedule() taking the following amount of time: runqueue length microseconds (500MHz PII) 2 5 4 6 6 6 8 6 16 7.5 24 11 32 15 48 20 64 25 128 44 Seems surprisingly slow? - #include stdio.h #include stdlib.h main(int argc, char *argv[]) { int n = (argc 1) ? atoi(argv[1]) : 20; int i; for (i = 0; i n - 1; i++) { if (fork() == 0) { sleep(1); for ( ; ; ) ; } } printf("created %d busywaiters\n", n); for ( ; ; ) ; } --- linux-2.4.1-pre10/kernel/sched.cTue Jan 23 19:28:16 2001 +++ linux-akpm/kernel/sched.c Tue Jan 23 23:18:21 2001 @@ -33,6 +33,13 @@ extern void tqueue_bh(void); extern void immediate_bh(void); +#include asm/msr.h +static struct { + unsigned long acc_time; + unsigned long n_times; + unsigned long long in; +} cacheline_aligned schedule_stats[NR_CPUS + 1]; + /* * scheduler variables */ @@ -505,7 +512,7 @@ * tasks can run. It can not be killed, and it cannot sleep. The 'state' * information in task[0] is never used. */ -asmlinkage void schedule(void) +static void __schedule(void) { struct schedule_data * sched_data; struct task_struct *prev, *next, *p; @@ -688,6 +695,88 @@ BUG(); return; } + +// + +static unsigned long long dummy; +static unsigned long long calib; +static int done_calib; + +static void do_one(void) +{ + rdtscll(dummy); +} + +static void calibrate(void) +{ + unsigned long long in, out; + unsigned long flags; + int i; + + local_irq_save(flags); + rdtscll(in); + for (i = 0; i 0x1; i++) { + do_one(); + } + rdtscll(out); + local_irq_restore(flags); + calib = (out - in) 16; + done_calib = 1; +} + +asmlinkage void schedule(void) +{ + int cpu = smp_processor_id(); + unsigned long long out; + + if (!done_calib) + calibrate(); + + rdtscll(schedule_stats[cpu].in); + __schedule(); + rdtscll(out); + + schedule_stats[cpu].acc_time += out - schedule_stats[cpu].in - calib; + schedule_stats[cpu].n_times++; +} + +static atomic_t cpu_count; + +static void ss_dumper(void *dummy) +{ + int cpu = smp_processor_id(); + while (atomic_read(cpu_count) != cpu) + ; + printk("CPU %d: %lu / %lu = %lu cycles/switch\n", + cpu, schedule_stats[cpu].acc_time, schedule_stats[cpu].n_times, + schedule_stats[cpu].acc_time / schedule_stats[cpu].n_times); + + schedule_stats[NR_CPUS].acc_time += schedule_stats[cpu].acc_time; + schedule_stats[NR_CPUS].n_times += schedule_stats[cpu].n_times; + + schedule_stats[cpu].acc_time = 0; + schedule_stats[cpu].n_times = 0; + atomic_inc(cpu_count); + if (atomic_read(cpu_count) == smp_num_cpus) { + printk("total: %lu / %lu = %lu cycles/switch\n", +
Re: more on scheduler benchmarks
Maybe I've been off in the hardware lab for too long, but how about 1. using ioperm to give access to the parallel port. 2. have your program write a byte (thread id % 256 ?) constantly to the port during it's other activity 3. capture the results from another computer with an ecp port This way you don't run the risk of altering the scheduler behavior with your logging procedure. Mike Kravetz wrote: > Last week while discussing scheduler benchmarks, Bill Hartner > made a comment something like the following "the benchmark may > not even be invoking the scheduler as you expect". This comment > did not fully sink in until this weekend when I started thinking > about changes made to sched_yield() in 2.4.0. (I'm cc'ing Ingo > Molnar because I think he was involved in the changes). If you > haven't taken a look at sys_sched_yield() in 2.4.0, I suggest > that you do that now. > > A result of new optimizations made to sys_sched_yield() is that > calling sched_yield() does not result in a 'reschedule' if there > are no tasks waiting for CPU resources. Therefore, I would claim > that running 'scheduler benchmarks' which loop doing sched_yield() > seem to have little meaning/value for runs where the number of > looping tasks is less than then number of CPUs in the system. Is > that an accurate statement? > > If the above is accurate, then I am wondering what would be a > good scheduler benchmark for these low task count situations. > I could undo the optimizations in sys_sched_yield() (for testing > purposes only!), and run the existing benchmarks. Can anyone > suggest a better solution? > > Thanks, -- Joe deBlaquiere Red Hat, Inc. 307 Wynn Drive Huntsville AL, 35805 voice : (256)-704-9200 fax : (256)-837-3839 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more on scheduler benchmarks
On Monday 22 January 2001 10:30, Mike Kravetz wrote: > Last week while discussing scheduler benchmarks, Bill Hartner > made a comment something like the following "the benchmark may > not even be invoking the scheduler as you expect". This comment > did not fully sink in until this weekend when I started thinking > about changes made to sched_yield() in 2.4.0. (I'm cc'ing Ingo > Molnar because I think he was involved in the changes). If you > haven't taken a look at sys_sched_yield() in 2.4.0, I suggest > that you do that now. > > A result of new optimizations made to sys_sched_yield() is that > calling sched_yield() does not result in a 'reschedule' if there > are no tasks waiting for CPU resources. Therefore, I would claim > that running 'scheduler benchmarks' which loop doing sched_yield() > seem to have little meaning/value for runs where the number of > looping tasks is less than then number of CPUs in the system. Is > that an accurate statement? With this kind of test tasks are always running. If You print the nr_running You'll find that this is exactly ( at least ) the number of tasks You've spawned so the scheduler is always called. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more on scheduler benchmarks
On Monday 22 January 2001 10:30, Mike Kravetz wrote: Last week while discussing scheduler benchmarks, Bill Hartner made a comment something like the following "the benchmark may not even be invoking the scheduler as you expect". This comment did not fully sink in until this weekend when I started thinking about changes made to sched_yield() in 2.4.0. (I'm cc'ing Ingo Molnar because I think he was involved in the changes). If you haven't taken a look at sys_sched_yield() in 2.4.0, I suggest that you do that now. A result of new optimizations made to sys_sched_yield() is that calling sched_yield() does not result in a 'reschedule' if there are no tasks waiting for CPU resources. Therefore, I would claim that running 'scheduler benchmarks' which loop doing sched_yield() seem to have little meaning/value for runs where the number of looping tasks is less than then number of CPUs in the system. Is that an accurate statement? With this kind of test tasks are always running. If You print the nr_running You'll find that this is exactly ( at least ) the number of tasks You've spawned so the scheduler is always called. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more on scheduler benchmarks
Maybe I've been off in the hardware lab for too long, but how about 1. using ioperm to give access to the parallel port. 2. have your program write a byte (thread id % 256 ?) constantly to the port during it's other activity 3. capture the results from another computer with an ecp port This way you don't run the risk of altering the scheduler behavior with your logging procedure. Mike Kravetz wrote: Last week while discussing scheduler benchmarks, Bill Hartner made a comment something like the following "the benchmark may not even be invoking the scheduler as you expect". This comment did not fully sink in until this weekend when I started thinking about changes made to sched_yield() in 2.4.0. (I'm cc'ing Ingo Molnar because I think he was involved in the changes). If you haven't taken a look at sys_sched_yield() in 2.4.0, I suggest that you do that now. A result of new optimizations made to sys_sched_yield() is that calling sched_yield() does not result in a 'reschedule' if there are no tasks waiting for CPU resources. Therefore, I would claim that running 'scheduler benchmarks' which loop doing sched_yield() seem to have little meaning/value for runs where the number of looping tasks is less than then number of CPUs in the system. Is that an accurate statement? If the above is accurate, then I am wondering what would be a good scheduler benchmark for these low task count situations. I could undo the optimizations in sys_sched_yield() (for testing purposes only!), and run the existing benchmarks. Can anyone suggest a better solution? Thanks, -- Joe deBlaquiere Red Hat, Inc. 307 Wynn Drive Huntsville AL, 35805 voice : (256)-704-9200 fax : (256)-837-3839 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/