What version/rev of Solaris are you running? Thanks
Dave V Jonathan Adams wrote: > On Apr 10, 2008, at 9:58 AM, Zeljko Vrba wrote: > >> I have a 1:M producer-consumer problem (1 producer, M consumers) >> that I use for benchmarking. Every (producer, consumer_i) pair has >> a dedicated message queue implemented with a linked list protected >> with one pthread_mutex_t and two pthread_cond_t with default >> attributes. >> >> A 256MB block of RAM is divided into M = 2^0 .. 2^14 chunks, and for >> each chunk a new thread is created; the producer has also a >> dedicated thread (yes, I create up to 16385 threads). For each >> chunk, a simple message consisting of (pointer,length) pair is sent >> 128 times to each of the workers and AES-encrypted. When the worker >> is finished with a chunk, it sends the reply to the producer. If the >> message queue is empty, the threads block on their respective >> condition variables associated with the queues. The producer first >> sends all 128 chunks to the workers (which are signalled if >> necessary), then waits for 128 replies from each of them. This is >> done serially, and not round-robin (i.e. first 128 messages are sent >> to worker 0, then to worker 1, etc..; similarly when reading >> replies). Each communication also includes allocation and freeing of >> a message; for this the umem allocator is used with a cache. The >> threads' stack sizes are set to 16kB. >> >> The benchmark is compiled in 64-bit mode and executed on Solaris 10, >> dual-core AMD64 (1 socket with two cores) and 2GB of RAM. Now the >> results: for M=2^1 .. M=2^11 (2 .. 2048) threads, the running time >> (wall-clock time) is fairly constant around ~10 seconds. Beyond >> this number, as M doubles, the running time also roughly doubles: >> (2^12 threads, 13s), (2^13 threads, 20s), (2^14 threads, 35s). >> >> Running iostat and vmstat in parallel confirms that no swapping >> occurs. 33% of time is reported to be spent in system, (with 9% of >> CPU time idle?!), with ~150k/sec systemcalls and ~120k/sec context >> switches. >> >> Can anybody offer some insight on why this sudden degradation in >> performance occurs? >> > > > Do you have administrator access to this system? With dtrace(1M), you > can drill down on this, and get more data on what's going on. See: > > http://www.sun.com/bigadmin/content/dtrace/ > > for an intro. As a start, get the output from: > > dtrace -n '[EMAIL PROTECTED](), ustack()] = count();} END > {trunc(@, 10);}' \ > -c "your test command here" > > where 'your test command here' is replaced with an invocation with > 2^13 threads.[1] This will give us more data to start from. > > Cheers, > - jonathan > > [1] This invocation collects kernel and userland stack traces from > all CPUs 97 times a second, while your command is running. We count > the number of times an identical stack trace is hit. When the test > command finishes, we keep only the 10 "most-hit" traces, and the > dtrace command prints them out, along with your counts. On an idle > system, you'd see something like: > > # dtrace -n'[EMAIL PROTECTED](), ustack()] = count();} END {trunc(@, > 10);}' -c 'sleep 2' > dtrace: description 'profile-97' matched 2 probes > dtrace: pid 102979 has exited > CPU ID FUNCTION:NAME > 10 2 :END > > > unix`mach_cpu_idle+0x17 > unix`cpu_idle+0xdd > unix`idle+0x10e > unix`thread_start+0x8 > > 3120 > > > > -------------------------------------------------------------------------- > Jonathan Adams, Sun Microsystems, ZFS Team http://blogs.sun.com/jwadams > > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org > _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org