Hi All, I have a user level benchmark that does for (i = 0; i < nthreads; i++) (void) thr_create(NULL, 0, testaes, (void *)0, THR_NEW_LWP, &tid);
I found that running this benchmark with nthreads == ncpus schedules each thread to a separate CPU. The system is a Niagara 2 with 128 CPUs/strands. However, for a kernel module/benchmark that does for (i = 0; i < nthreads; i++) (void) thread_create(NULL, 0, &process_aes, (void *)i, 0, &p0, TS_RUN, minclsyspri); the scheduling is very uneven and a whole set of CPUs from 64-127 did not have any thread scheduled on them. The distribution among 0-63 is also uneven. I assume the thread scheduling behavior is different for system threads which do not have a LWP. But, is this not sub optimal? Is the assumption that kernel subsystems that need to use a large number of threads do their own CPU binding/scheduling to assure even distribution? Thanks, -Krishna _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org