Zeljko Vrba wrote: > On Fri, Apr 11, 2008 at 11:23:10AM -0700, David Lutz wrote: >> Take a look at your cache miss rates as you cross the 2^11 boundary. >> My guess is that you will see something start to go through the roof. >> > cputrack has too much overhead when having a bunch of LWPs. I did run > cpustat though, in parallel with my experiment, with the following events > on AMD64; the interval was 1 second: > > pic0=DC_miss,pic1=DC_dtlb_L1_miss_L2_miss,pic2=IC_itlb_L1_miss_L2_miss > > The number of data cache misses _does_ increase too, but what's worse is > DTLB and ITLB misses. Both roughly double with the number of threads, but > the number of ITLB misses saturates at ~470k/s, and this saturation happens > at the transition between 2048 and 4096 threads. > > All threads are executing the same code which is rather small -- so I see > no reason for this linear increase in the # of ITLB misses with the number > of threads. OK, more threads = more user<>kernel transitions. Does Solaris > make use of the global bit in page directories/tables? >
What's the size of the relevant TLBs? With text, stack and heap mappings for all threads, this result isn't terribly surprising. Solaris cannot use the global bits for user mappings since the locations of libraries, etc, aren't fixed. The kernel mapping are global if the CPU supports that. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts "You will contribute more with mercurial than with thunderbird." _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org