On Apr 10, 2008, at 9:58 AM, Zeljko Vrba wrote: > I have a 1:M producer-consumer problem (1 producer, M consumers) > that I use for benchmarking. Every (producer, consumer_i) pair has > a dedicated message queue implemented with a linked list protected > with one pthread_mutex_t and two pthread_cond_t with default > attributes. > > A 256MB block of RAM is divided into M = 2^0 .. 2^14 chunks, and for > each chunk a new thread is created; the producer has also a > dedicated thread (yes, I create up to 16385 threads). For each > chunk, a simple message consisting of (pointer,length) pair is sent > 128 times to each of the workers and AES-encrypted. When the worker > is finished with a chunk, it sends the reply to the producer. If the > message queue is empty, the threads block on their respective > condition variables associated with the queues. The producer first > sends all 128 chunks to the workers (which are signalled if > necessary), then waits for 128 replies from each of them. This is > done serially, and not round-robin (i.e. first 128 messages are sent > to worker 0, then to worker 1, etc..; similarly when reading > replies). Each communication also includes allocation and freeing of > a message; for this the umem allocator is used with a cache. The > threads' stack sizes are set to 16kB. > > The benchmark is compiled in 64-bit mode and executed on Solaris 10, > dual-core AMD64 (1 socket with two cores) and 2GB of RAM. Now the > results: for M=2^1 .. M=2^11 (2 .. 2048) threads, the running time > (wall-clock time) is fairly constant around ~10 seconds. Beyond > this number, as M doubles, the running time also roughly doubles: > (2^12 threads, 13s), (2^13 threads, 20s), (2^14 threads, 35s). > > Running iostat and vmstat in parallel confirms that no swapping > occurs. 33% of time is reported to be spent in system, (with 9% of > CPU time idle?!), with ~150k/sec systemcalls and ~120k/sec context > switches. > > Can anybody offer some insight on why this sudden degradation in > performance occurs?
Do you have administrator access to this system? With dtrace(1M), you can drill down on this, and get more data on what's going on. See: http://www.sun.com/bigadmin/content/dtrace/ for an intro. As a start, get the output from: dtrace -n '[EMAIL PROTECTED](), ustack()] = count();} END {trunc(@, 10);}' \ -c "your test command here" where 'your test command here' is replaced with an invocation with 2^13 threads.[1] This will give us more data to start from. Cheers, - jonathan [1] This invocation collects kernel and userland stack traces from all CPUs 97 times a second, while your command is running. We count the number of times an identical stack trace is hit. When the test command finishes, we keep only the 10 "most-hit" traces, and the dtrace command prints them out, along with your counts. On an idle system, you'd see something like: # dtrace -n'[EMAIL PROTECTED](), ustack()] = count();} END {trunc(@, 10);}' -c 'sleep 2' dtrace: description 'profile-97' matched 2 probes dtrace: pid 102979 has exited CPU ID FUNCTION:NAME 10 2 :END unix`mach_cpu_idle+0x17 unix`cpu_idle+0xdd unix`idle+0x10e unix`thread_start+0x8 3120 -------------------------------------------------------------------------- Jonathan Adams, Sun Microsystems, ZFS Team http://blogs.sun.com/jwadams _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org