senthil ramanujam wrote: > Good tips. I have been testing the performance difference of > system-calls, library routines, ipc implementations between Solaris > and Linux. This is one of the test codes I quickly put together to > show the difference. I'll keep your options on the table and suggest > soon as we get to this point. > > many thanks.
Have you taken a look at libmicro? It might be just what you want! http://opensolaris.org/os/project/libmicro/ -neel > > senthil > > On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> Your welcome. If the value you are incrementing is unique to each thread >> you >> can try forcing them onto their own cache line >> >> int value[64]; >> #define VALUE_1 0 >> #define VALUE_2 16 >> #define VALUE_3 32 >> #define VALUE_4 48 >> >> Where value would be one location for the thread. This will remove >> potential cache thrashing going on. >> If you are using one value then you could try placing them together on a >> cache line (take the single miss for the lock and update). >> >> struct data { >> mutex lock; >> int count; >> int pad_to_64[14]; >> }; >> >> There are games that you need to play with the compiler/loader to get these >> lined up on their own >> cache line, what that game is depends on the code. >> >> Dave V >> >> >> senthil ramanujam wrote: >> >>> Hi Dave, >>> >>> hmmm, I was kind of expecting this answer. Thanks for your analysis. >>> >>> senthil >>> >>> On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >>> >>> >>>> Ok, this pretty much verifies what I thought. In the short, by running >>>> truss you have >>>> slowed things down enough in the application to significantly remove >> the >>>> lock contention >>>> on the program's mutex. This is one of those performance items that >>>> appears to be contradictory >>>> on the surface. >>>> >>>> A bit more on this now: >>>> >>>> Your program has a very tight loop, you acquire the lock, increment a >>>> variable and drop the >>>> lock. There is a whole slew of atomic_* operations that essentially >> does >>>> the same thing, just >>>> more efficient (if they can be used depends on the case). >>>> By being a tight loop as it is, during the operation of the program >> without >>>> truss it ends up having >>>> the threads queuing up on each other in attempt to acquire the lock. >> When >>>> we miss on a lock because >>>> it is owned by another thread we end up going into a totally different >> set >>>> of lock routines which is >>>> expensive. By running truss what you have done is slowed things down >>>> enough to reduce the >>>> lock contention being seen on the mutex in your program and the >> likeliness >>>> of entering the "slow" lock >>>> path. If you were to add work before or after the mutex operation you >> will >>>> see the same effect, and probably >>>> even to a greater degree. >>>> Hope this helps. >>>> >>>> >>>> Dave Valin >>>> >>>> senthil ramanujam wrote: >>>> >>>> >>>> >>>>> Hi Dave, >>>>> >>>>> Here is the plockstat data for both the cases. Let me know if you like >>>>> to see more details. >>>>> >>>>> Looking forward to hear your comments. >>>>> >>>>> thanks. >>>>> >>>>> senthil >>>>> >>>>> >>>>> $ plockstat -A mutex_ops -w4 -r10 >>>>> number of threads:4, run-time (in seconds)=10 >>>>> Thread 2: total ops: 456161, ops/second: 45616 >>>>> Thread 0: total ops: 452104, ops/second: 45210 >>>>> Thread 1: total ops: 167338, ops/second: 16733 >>>>> Thread 3: total ops: 174826, ops/second: 17482 >>>>> plockstat: pid 13497 has exited >>>>> >>>>> Mutex hold >>>>> >>>>> Count nsec Lock Caller >>>>> >>>>> >>>>> >> ------------------------------------------------------------------------------- >>>> >>>>> 1248577 1639 mutex_ops`worker_mutex mutex_ops`do_run+0x1b2 >>>>> 1851 1960 mutex_ops`worker_mutex mutex_ops`do_run+0x1b2 >>>>> 3 104789 mutex_ops`_iob+0xa8 mutex_ops`do_run+0x25a >>>>> 1 82008 libc.so.1`_uberdata+0xfc0 0x400e16 >>>>> 1 42599 libc.so.1`_first_link_lock libc.so.1`_exithandle+0x3b >>>>> 1 39967 libc.so.1`_uberdata+0x60 LM1`ld.so.1`rt_thr_init+0x29 >>>>> 4 9374 libc.so.1`_uberdata+0x1040 libc.so.1`_lwp_start >>>>> 4 8402 libc.so.1`_uberdata+0x40 >> libc.so.1`pthread_create+0x135 >>>>> 1 32844 mutex_ops`_iob+0x28 libc.so.1`_exithandle+0x3b >>>>> 3 9736 libc.so.1`_uberdata >>>>> >>>>> >>>> libc.so.1`_thr_exit_common+0x9c >>>> >>>> >>>>> 4 6343 libc.so.1`_uberdata mutex_ops`main+0x2e7 >>>>> 1 25319 mutex_ops`_iob+0xa8 mutex_ops`do_run+0x25a >>>>> 4 6162 libc.so.1`_uberdata mutex_ops`main+0x22e >>>>> 4 4369 libc.so.1`_uberdata >>>>> >>>>> >>>> libc.so.1`_thr_exit_common+0x9c >>>> >>>> >>>>> 2 6982 mutex_ops`worker_mutex mutex_ops`do_run+0xea >>>>> 6 1524 libc.so.1`_uberdata+0xd40 libc.so.1`exit+0x11 >>>>> 1 8422 libc.so.1`_uberdata >>>>> >>>>> >>>> libc.so.1`_thr_exit_common+0x9c >>>> >>>> >>>>> 4 1934 libc.so.1`_uberdata+0x1040 libc.so.1`_thrp_unwind+0x5e >>>>> 1 7646 0xfffffd7fff220100 >> libc.so.1`_thr_exit_common+0x9c >>>>> 2 3517 mutex_ops`worker_mutex mutex_ops`do_run+0xea >>>>> 1 6603 libc.so.1`_uberdata+0xd40 libc.so.1`libc_init+0x2c6 >>>>> 4 1516 libc.so.1`_uberdata+0x1000 libc.so.1`_thrp_unwind+0x5e >>>>> 1 6029 0xfffffd7fff220080 >> libc.so.1`pthread_create+0x135 >>>>> 1 5789 0xfffffd7fff220080 >> libc.so.1`_thr_exit_common+0x9c >>>>> 1 4387 0xfffffd7fff2200c0 >> libc.so.1`pthread_create+0x135 >>>>> 3 1447 libc.so.1`_uberdata+0xe80 libc.so.1`find_stack+0x18a >>>>> 3 1396 libc.so.1`_uberdata libc.so.1`_thrp_create+0x531 >>>>> 1 4078 0xfffffd7fff220100 >> libc.so.1`pthread_create+0x135 >>>>> 3 1309 libc.so.1`_uberdata >> libc.so.1`pthread_create+0xfe >>>>> 1 3900 0xfffffd7fff220140 >> libc.so.1`pthread_create+0x135 >>>>> 1 3811 0xfffffd7fff2200c0 >> libc.so.1`_thr_exit_common+0x9c >>>>> 1 3436 0xfffffd7fff220140 >> libc.so.1`_thr_exit_common+0x9c >>>>> 1 2827 0xfffffd7fff220080 >> libc.so.1`pthread_create+0xfe >>>>> 2 1367 libc.so.1`_uberdata+0xe40 libc.so.1`_thrp_unwind+0x5e >>>>> 1 2369 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 >>>>> 1 2341 mutex_ops`worker_mutex mutex_ops`do_run+0x1b2 >>>>> 1 2311 libc.so.1`_uberdata+0x1040 LM1`ld.so.1`rt_thr_init+0x29 >>>>> 1 2056 0xfffffd7fff220100 mutex_ops`main+0x2e7 >>>>> 1 1951 0xfffffd7fff2200c0 mutex_ops`main+0x2e7 >>>>> 1 1863 0xfffffd7fff220140 mutex_ops`main+0x2e7 >>>>> 1 1803 0xfffffd7fff220140 >> libc.so.1`pthread_create+0xfe >>>>> 1 1794 libc.so.1`_uberdata mutex_ops`main+0x22e >>>>> 1 1782 0xfffffd7fff220100 >> libc.so.1`pthread_create+0xfe >>>>> 1 1671 0xfffffd7fff2200c0 >> libc.so.1`pthread_create+0xfe >>>>> 1 1622 mutex_ops`_iob+0xa8 libc.so.1`_exithandle+0x3b >>>>> 1 1510 0xfffffd7fff220080 mutex_ops`main+0x2e7 >>>>> 1 1508 libc.so.1`_uberdata+0xe80 libc.so.1`_thrp_create+0x531 >>>>> 1 1506 libc.so.1`_uberdata+0xd40 0x400dbb >>>>> 1 1495 libc.so.1`_uberdata+0xfc0 0x400dbb >>>>> 1 1446 libc.so.1`_uberdata+0xfc0 >> libc.so.1`pthread_create+0xfe >>>>> 1 1398 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 >>>>> 1 1389 libc.so.1`_uberdata >> libc.so.1`pthread_create+0xfe >>>>> 1 1370 libc.so.1`_uberdata+0xfc0 >> libc.so.1`pthread_create+0xfe >>>>> 1 1347 libc.so.1`_uberdata+0xd40 0x400dc7 >>>>> 1 1310 libc.so.1`_uberdata+0xfc0 0x400dc7 >>>>> 1 1194 libc.so.1`_uberdata+0xfc0 >> libc.so.1`pthread_create+0xfe >>>>> 1 1177 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 >>>>> >>>>> Mutex block >>>>> >>>>> Count nsec Lock Caller >>>>> >>>>> >>>>> >> ------------------------------------------------------------------------------- >>>> >>>>> 866 21921703 mutex_ops`worker_mutex mutex_ops`do_run+0x150 >>>>> 692495 8328 mutex_ops`worker_mutex mutex_ops`do_run+0x150 >>>>> 2 11086954 mutex_ops`worker_mutex mutex_ops`do_run+0x94 >>>>> 3 219879 mutex_ops`_iob+0xa8 libc.so.1`_flockget+0x1a >>>>> 1 12013 libc.so.1`_uberdata mutex_ops`main+0x2e7 >>>>> 1 2094 libc.so.1`_uberdata >>>>> >>>>> >>>> libc.so.1`_thr_exit_common+0x9c >>>> >>>> >>>>> << trusss + mutex >> >>>>> $ plockstat -A truss -fc mutex_ops -w4 -r10 >>>>> number of threads:4, run-time (in seconds)=10 >>>>> Thread 2: total ops: 17918909, ops/second: 1791890 >>>>> Thread 0: total ops: 18518051, ops/second: 1851805 >>>>> Thread 1: total ops: 18280090, ops/second: 1828009 >>>>> Thread 3: total ops: 18889775, ops/second: 1888977 >>>>> >>>>> syscall seconds calls errors >>>>> _exit .000 1 >>>>> write .000 5 >>>>> open .000 3 1 >>>>> close .000 2 >>>>> stat .000 10 7 >>>>> getpid .000 1 >>>>> fstat .000 1 >>>>> ioctl .000 1 >>>>> execve .000 1 >>>>> lwp_park .634 94191 >>>>> lwp_unpark .501 94191 >>>>> sigaction .000 1 >>>>> getcontext .000 1 >>>>> setustack .000 5 >>>>> mmap .000 16 >>>>> munmap .000 2 >>>>> getrlimit .000 1 >>>>> memcntl .000 1 >>>>> sysconfig .000 2 >>>>> yield .082 12845 >>>>> lwp_create .000 4 >>>>> lwp_exit .000 4 >>>>> lwp_continue .000 4 >>>>> lwp_sigmask .000 4 >>>>> lwp_private .000 1 >>>>> lwp_wait .000 4 >>>>> lwp_mutex_wakeup .000 10 >>>>> nanosleep .000 1 >>>>> schedctl .000 5 >>>>> resolvepath .000 4 >>>>> lwp_mutex_timedlock .000 68 >>>>> getcwd .000 1 >>>>> -------- ------ ---- >>>>> sys totals: 1.219 201391 8 >>>>> usr time: 17.809 >>>>> elapsed: 10.030 >>>>> plockstat: pid 13499 has exited >>>>> >>>>> Mutex hold >>>>> >>>>> Count nsec Lock Caller >>>>> >>>>> >>>>> >> ------------------------------------------------------------------------------- >>>> >>>>> 257123 8166 truss`truss_lock truss`worker_thread+0x3a7 >>>>> 145651 11250 truss`truss_lock truss`worker_thread+0x3a7 >>>>> 201384 1893 truss`count_lock truss`sysexit+0x82d >>>>> 1 2468522 libc.so.1`_uberdata+0x60 libproc.so.1`Pxcreate+0x63 >>>>> 1 2447046 libnsl.so.1`_ti_userlock >>>>> libnsl.so.1`_libnsl_parent_atfork+0x10 >>>>> 1 2380540 libc.so.1`_uberdata+0xfc0 libc.so.1`fork+0x130 >>>>> 1 2364179 libc.so.1`_first_link_lock libc.so.1`fork+0x130 >>>>> 1 2356270 libc.so.1`libc_malloc_lock libc.so.1`fork+0x130 >>>>> 1 2337553 libc.so.1`_uberdata+0x40 libproc.so.1`Pxcreate+0x63 >>>>> 4 52074 0x455478 libproc.so.1`Lgrab+0x234 >>>>> 1 144394 libc.so.1`_uberdata+0xfc0 truss`main_thread+0x1e8 >>>>> 4 32702 truss`truss_lock truss`worker_thread+0x5d6 >>>>> 5 23212 0x455478 libproc.so.1`Lfree+0x3a >>>>> 1 66469 0xfef9e1b0 0xfef06df3 >>>>> 1 64668 0xfefa12c0 0xfef67a91 >>>>> 32 1858 truss`_iob+0xa8 truss`report+0x223 >>>>> 1 58652 0x455478 libproc.so.1`Lgrab+0x234 >>>>> 5 11034 libc.so.1`_uberdata >>>>> >>>>> >>>> libc.so.1`_thr_exit_common+0x9c >>>> >>>> >>>>> 32 1583 truss`_iob+0xa8 truss`prtim+0x51 >>>>> 32 1547 truss`_iob+0xa8 truss`report+0x25e >>>>> 2 23108 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 32 1412 truss`_iob+0xa8 truss`prtim+0x64 >>>>> 1 40708 0xfefa12a0 0xfef021df >>>>> 1 40112 libc.so.1`_uberdata+0x60 LM1`ld.so.1`rt_thr_init+0x29 >>>>> 1 34837 libc.so.1`_first_link_lock libc.so.1`_exithandle+0x3b >>>>> 5 6891 truss`truss_lock truss`main_thread+0x116 >>>>> 1 34270 0xfef9e5d0 0xfef14052 >>>>> 4 8244 libc.so.1`_uberdata+0x40 truss`worker_thread+0xe52 >>>>> 1 31319 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 5 6235 libc.so.1`_uberdata truss`main_thread+0x125 >>>>> 4 7557 libc.so.1`libc_malloc_lock libproc.so.1`Lgrab+0xd3 >>>>> 5 6010 libc.so.1`_uberdata+0x1040 libc.so.1`_lwp_start >>>>> 1 29687 libc.so.1`libc_malloc_lock truss`main+0x4ac >>>>> 1 28272 libc.so.1`mktemp_lock.0 truss`main+0xdb4 >>>>> 1 26436 truss`truss_lock truss`worker_thread+0x5d6 >>>>> 1 25322 truss`_iob+0xa8 libc.so.1`_exithandle+0x3b >>>>> 1 25197 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 4 6142 libc.so.1`_uberdata truss`worker_thread+0xe34 >>>>> 1 21847 libc.so.1`libc_malloc_lock truss`my_calloc+0x18 >>>>> 1 19740 libproc.so.1`minfd_lock.0 libproc.so.1`set_minfd+0x46 >>>>> 1 17725 0xfffffd7fff060140 >> libc.so.1`_thr_exit_common+0x9c >>>>> 1 14834 libc.so.1`libc_malloc_lock truss`free_private+0x44 >>>>> 5 2865 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 12900 0xfffffd7fff200080 truss`Xcritical+0x46 >>>>> 5 2470 libc.so.1`_uberdata >>>>> >>>>> >>>> libc.so.1`_thr_exit_common+0x9c >>>> >>>> >>>>> 1 12239 libc.so.1`__sbrk_lock >>>>> >>>>> >>>> libc.so.1`_malloc_unlocked+0x10d >>>> >>>> >>>>> 1 12155 0xfef9e1f0 0xfef0703b >>>>> 1 12062 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 6 2004 truss`count_lock truss`worker_thread+0x108a >>>>> 1 11918 libc.so.1`_uberdata+0x40 truss`main_thread+0x1ed >>>>> 5 2347 truss`count_lock truss`worker_thread+0xb09 >>>>> 1 11608 0xfef9f708 0xfef26ac9 >>>>> 5 2215 libc.so.1`libc_malloc_lock >>>>> >>>>> >>>> libproc.so.1`Lfree_internal+0x67 >>>> >>>> >>>>> 1 10685 libc.so.1`update_lock >> libctf.so.1`_libctf_init+0x10 >>>>> 1 10092 libc.so.1`_uberdata libproc.so.1`Pxcreate+0x63 >>>>> 5 2016 libc.so.1`_uberdata+0x1040 libc.so.1`_thrp_unwind+0x5e >>>>> 1 9656 0x455478 libproc.so.1`Pfree+0x117 >>>>> 2 4810 libc.so.1`__sbrk_lock libc.so.1`realloc+0x235 >>>>> 5 1871 0xfffffd7fff200080 truss`Xcritical+0x46 >>>>> 6 1543 libc.so.1`_uberdata+0xd40 libc.so.1`exit+0x11 >>>>> 5 1744 libc.so.1`libc_malloc_lock truss`free_private+0x55 >>>>> 5 1738 libc.so.1`_uberdata+0x1000 libc.so.1`_thrp_unwind+0x5e >>>>> 5 1688 libc.so.1`libc_malloc_lock truss`free_private+0x66 >>>>> 1 8407 truss`count_lock truss`setup_basetime+0x60 >>>>> 1 8252 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 2 4044 libc.so.1`__sbrk_lock libc.so.1`malloc+0x2d >>>>> 1 8072 libc.so.1`libc_malloc_lock libproc.so.1`Lgrab+0x72 >>>>> 5 1612 libc.so.1`_uberdata+0x1000 libc.so.1`_thrp_unwind+0x5e >>>>> 1 8046 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 5 1543 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 2 3818 libc.so.1`__sbrk_lock libc.so.1`realloc+0x235 >>>>> 4 1897 libc.so.1`_uberdata libc.so.1`thr_create+0x3f >>>>> 1 7516 0xfef9e5d0 0xfef14052 >>>>> 5 1497 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 7375 0xfffffd7fff060080 truss`main_thread+0x1ed >>>>> 1 7372 libc.so.1`_uberdata truss`create_thread+0x58 >>>>> 1 6977 libc.so.1`_uberdata+0xd40 libc.so.1`libc_init+0x2c6 >>>>> 1 6664 0xfefa1f80 0xfef63d56 >>>>> 4 1663 libc.so.1`_uberdata+0xe80 libc.so.1`find_stack+0x18a >>>>> 1 6569 0xfffffd7fff060180 >> libc.so.1`_thr_exit_common+0x9c >>>>> 2 3186 libc.so.1`__sbrk_lock libc.so.1`malloc+0x2d >>>>> 1 6260 libc.so.1`_uberdata+0x1000 truss`main+0xf8 >>>>> 1 5847 libc.so.1`libc_malloc_lock libproc.so.1`Lgrab+0xd3 >>>>> 1 5799 libc.so.1`_uberdata+0x60 >>>>> >>>>> >>>> libnsl.so.1`_libnsl_lock_init+0x86 >>>> >>>> >>>>> 1 5664 0xfffffd7fff060140 truss`worker_thread+0xe52 >>>>> 1 5502 0xfefa2240 0xfef734b2 >>>>> 4 1338 libc.so.1`_uberdata libc.so.1`_thrp_create+0x531 >>>>> 1 5248 0xfffffd7fff0600c0 truss`worker_thread+0xe52 >>>>> 1 5204 libc.so.1`__sbrk_lock >>>>> >>>>> >>>> libc.so.1`_malloc_unlocked+0x10d >>>> >>>> >>>>> 1 5141 truss`count_lock truss`per_proc_init+0xe1 >>>>> 1 5105 0xfefa10e0 0xfef62237 >>>>> 1 4904 0xfefa10e0 0xfef62237 >>>>> 3 1613 libc.so.1`libc_malloc_lock truss`free_private+0x44 >>>>> 1 4403 libc.so.1`_uberdata+0x140 truss`main+0xbeb >>>>> 1 4245 0xfffffd7fff060100 truss`worker_thread+0xe52 >>>>> 1 3702 0xfffffd7fff0600c0 >> libc.so.1`_thr_exit_common+0x9c >>>>> 1 3469 truss`_iob+0xa8 truss`report+0x418 >>>>> 2 1702 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 3396 0xfffffd7fff060100 >> libc.so.1`_thr_exit_common+0x9c >>>>> 2 1641 libc.so.1`_uberdata+0xe40 libc.so.1`_thrp_unwind+0x5e >>>>> 2 1514 truss`_iob+0xa8 truss`report+0x366 >>>>> 2 1477 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 2 1369 libc.so.1`libc_malloc_lock truss`sysentry+0x4e7 >>>>> 1 2653 0xfffffd7fff060080 libc.so.1`thr_create+0x3f >>>>> 1 2646 libc.so.1`_uberdata+0x140 truss`main+0xbff >>>>> 1 2528 truss`truss_lock truss`main_thread+0x116 >>>>> 1 2470 libc.so.1`_uberdata+0x580 truss`main+0x554 >>>>> 1 2452 0xfffffd7fff200080 truss`Xcritical+0x46 >>>>> 1 2391 0xfffffd7fff060080 >> libc.so.1`_thr_exit_common+0x9c >>>>> 1 2223 libc.so.1`_uberdata+0x440 truss`main+0x545 >>>>> 1 2219 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 2203 libc.so.1`_uberdata+0x4c0 truss`main+0x518 >>>>> 1 2187 libc.so.1`_uberdata+0x500 truss`main+0x527 >>>>> 1 2185 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 >>>>> 1 2182 0xfefa2280 0xfef73271 >>>>> 1 2127 0xfffffd7fff060080 truss`main_thread+0x125 >>>>> 1 2113 0xfffffd7fff060180 truss`worker_thread+0xe52 >>>>> 1 2090 libc.so.1`_uberdata+0x1040 LM1`ld.so.1`rt_thr_init+0x29 >>>>> 1 2082 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 2072 libc.so.1`_uberdata+0x540 truss`main+0x536 >>>>> 1 2057 libc.so.1`_uberdata+0x180 truss`main+0xc0e >>>>> 1 2041 libc.so.1`_uberdata+0x1c0 truss`main+0xc31 >>>>> 1 2035 0xfffffd7fff060140 truss`main_thread+0x125 >>>>> 1 2013 0xfffffd7fff060140 libc.so.1`thr_create+0x3f >>>>> 1 1968 libc.so.1`libc_malloc_lock truss`main+0xdf1 >>>>> 1 1961 0xfefa2000 0xfef63d56 >>>>> 1 1954 libc.so.1`_uberdata libproc.so.1`Pxcreate+0x63 >>>>> 1 1927 0xfffffd7fff0600c0 truss`main_thread+0x125 >>>>> 1 1923 libc.so.1`_uberdata+0x180 truss`main+0xc22 >>>>> 1 1918 0xfffffd7fff060100 libc.so.1`thr_create+0x3f >>>>> 1 1917 truss`_iob+0xa8 truss`prtim+0x20 >>>>> 1 1891 0xfffffd7fff060180 truss`main_thread+0x125 >>>>> 1 1890 libc.so.1`_uberdata+0x1c0 truss`main+0x507 >>>>> 1 1884 libc.so.1`seed_lk truss`main+0xdb4 >>>>> 1 1877 libc.so.1`libc_malloc_lock libproc.so.1`Pfree+0x10f >>>>> 1 1875 libc.so.1`libc_malloc_lock libproc.so.1`Pxcreate+0x52 >>>>> 1 1867 libc.so.1`libc_malloc_lock truss`main+0xdb4 >>>>> 1 1832 0xfffffd7fff0600c0 libc.so.1`thr_create+0x3f >>>>> 1 1830 0xfffffd7fff060100 truss`main_thread+0x125 >>>>> 1 1770 libc.so.1`_uberdata+0xfc0 libc.so.1`thr_create+0x3f >>>>> 1 1765 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 1754 truss`_iob+0x28 libc.so.1`_exithandle+0x3b >>>>> 1 1732 libc.so.1`_uberdata+0xd40 truss`main+0xf8 >>>>> 1 1723 libc.so.1`_uberdata+0xd40 truss`_start+0x1b >>>>> 1 1695 truss`_iob+0xa8 truss`report+0x4af >>>>> 1 1681 libc.so.1`_uberdata+0xd40 >>>>> >>>>> >>>> libnsl.so.1`_libnsl_lock_init+0x86 >>>> >>>> >>>>> 1 1674 libc.so.1`_uberdata+0xe80 libc.so.1`_thrp_create+0x531 >>>>> 1 1649 0xfefa1f80 0xfef63d56 >>>>> 1 1648 truss`_iob+0xa8 truss`report+0x492 >>>>> 1 1647 0xfef9e5d0 0xfef14052 >>>>> 1 1642 0xfffffd7fff200080 truss`Xcritical+0x46 >>>>> 1 1640 truss`_iob+0xa8 truss`report+0x486 >>>>> 1 1637 truss`_iob+0xa8 truss`prtim+0x20 >>>>> 1 1624 0xfffffd7fff060180 libc.so.1`thr_create+0x3f >>>>> 1 1616 0xfefa2200 0xfef01d17 >>>>> 1 1608 libc.so.1`_uberdata truss`create_thread+0x58 >>>>> 1 1608 truss`_iob+0xa8 truss`prtim+0x51 >>>>> 1 1547 truss`_iob+0xa8 truss`report+0x53e >>>>> 1 1547 truss`_iob+0xa8 truss`prtim+0x51 >>>>> 1 1533 0xfefa2040 0xfef63d56 >>>>> 1 1518 0xfefa1f80 0xfef63d56 >>>>> 1 1505 libc.so.1`libc_malloc_lock libproc.so.1`Pfree+0x1cf >>>>> 1 1488 0xfefa1f80 0xfef63d56 >>>>> 1 1466 truss`_iob+0xa8 truss`prtim+0x51 >>>>> 1 1445 libc.so.1`libc_malloc_lock truss`free_private+0x88 >>>>> 1 1443 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 >>>>> 1 1426 truss`_iob+0xa8 truss`report+0x4cb >>>>> 1 1424 truss`_iob+0xa8 truss`prtim+0x20 >>>>> 1 1415 0xfef9e5d0 0xfef14052 >>>>> 1 1404 libc.so.1`_uberdata+0xfc0 libc.so.1`thr_create+0x3f >>>>> 1 1397 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 >>>>> 1 1378 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 >>>>> 1 1372 libc.so.1`_uberdata libc.so.1`thr_create+0x3f >>>>> 1 1363 libc.so.1`libc_malloc_lock truss`free_private+0x90 >>>>> 1 1345 libc.so.1`_uberdata+0xd40 truss`_start+0x27 >>>>> 1 1343 0xfefa1f80 0xfef63d56 >>>>> 1 1341 libc.so.1`_uberdata+0xfc0 truss`_start+0x1b >>>>> 1 1340 0xfefa2200 0xfef01d17 >>>>> 1 1324 0xfefa1f80 0xfef63d56 >>>>> 1 1284 libc.so.1`_uberdata+0xfc0 libc.so.1`thr_create+0x3f >>>>> 1 1142 libc.so.1`_uberdata+0xfc0 truss`_start+0x27 >>>>> >>>>> Mutex block >>>>> >>>>> Count nsec Lock Caller >>>>> >>>>> >>>>> >> ------------------------------------------------------------------------------- >>>> >>>>> 124379 34360 truss`truss_lock truss`worker_thread+0x27d >>>>> 88153 2193 truss`truss_lock truss`worker_thread+0x27d >>>>> 3 55924 truss`truss_lock truss`worker_thread+0x5c4 >>>>> 1 12080 libc.so.1`libc_malloc_lock libc.so.1`free+0x18 >>>>> 1 2286 truss`truss_lock truss`main_thread+0xed >>>>> $ >>>>> >>>>> On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Could you please provide lockstat information from plockstat for >> each of >>>>>> these >>>>>> cases. I think I know what you are seeing, but I want data to back >> up >>>>>> >>>> the >>>> >>>> >>>>>> theory. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dave Valin >>>>>> >>>>>> >>>>>> senthil ramanujam wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am seeing a strange performance behavior with pthread mutex >>>>>>> operations on my Solaris (4-core) system running S10U5. >>>>>>> >>>>>>> The issue is that if the program is run with truss enabled, I am >>>>>>> getting about 3 times more the performance of the run without >> truss >>>>>>> enabled. Strange...right? >>>>>>> >>>>>>> The C program does the following: >>>>>>> - accept 2 parameters [number of concurrent threads(N), and >> run-time] >>>>>>> - create and run N threads simultaneously for <run-time> seconds >>>>>>> - each thread does the following: >>>>>>> - acquires the pthread mutex >>>>>>> - increases the value of it's _own_ long variable by 1 >>>>>>> - releases the pthread mutex >>>>>>> - after run-time expires, each thread reports the value of it's >> long >>>>>>> >>>>>>> >>>>>>> >>>>>> variable >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Note that I did not use any special flags to compile it. Reply >> back to >>>>>>> this email if you like to take a look at the program source code. >>>>>>> >>>>>>> Let's look at the program without truss enabled. >>>>>>> >>>>>>> $ mutex_ops -w4 -r10 >>>>>>> number of threads:4, run-time (in seconds)=10 >>>>>>> Thread 1: total ops: 4580775, ops/second: 458077 >>>>>>> Thread 3: total ops: 3798001, ops/second: 379800 >>>>>>> Thread 0: total ops: 4867850, ops/second: 486785 >>>>>>> Thread 2: total ops: 4925299, ops/second: 492529 >>>>>>> $ >>>>>>> <<vmstat 1>> >>>>>>> kthr memory page disk faults >>>>>>> >>>>>>> >>>> cpu >>>> >>>> >>>>>>> r b w swap free re mf pi po fr de sr s0 s1 -- -- in sy >> cs >>>>>>> >>>> us >>>> >>>> >>>>>>> >>>>>> sy id >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 22 205 0 0 0 0 0 0 0 0 0 7960 16218 >>>>>>> >>>>>>> >>>> 16155 28 >>>> >>>> >>>>>>> >>>>>> 2 71 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19914 41572 >>>>>>> >>>>>>> >>>> 41568 >>>> >>>> >>>>>>> >>>>>> 81 3 16 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 1 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 18711 39086 >>>>>>> >>>>>>> >>>> 38937 >>>> >>>> >>>>>>> >>>>>> 80 3 17 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19006 39373 >>>>>>> >>>>>>> >>>> 39382 >>>> >>>> >>>>>>> >>>>>> 80 3 17 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 18719 39151 >>>>>>> >>>>>>> >>>> 39007 >>>> >>>> >>>>>>> >>>>>> 80 3 16 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19560 41243 >>>>>>> >>>>>>> >>>> 40963 >>>> >>>> >>>>>>> >>>>>> 81 4 16 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 20498 43671 >>>>>>> >>>>>>> >>>> 43203 >>>> >>>> >>>>>>> >>>>>> 81 4 16 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 20857 44636 >>>>>>> >>>>>>> >>>> 44014 >>>> >>>> >>>>>>> >>>>>> 80 4 15 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19149 41239 >>>>>>> >>>>>>> >>>> 40583 >>>> >>>> >>>>>>> >>>>>> 80 4 16 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 21019 44189 >>>>>>> >>>>>>> >>>> 43995 >>>> >>>> >>>>>>> >>>>>> 81 4 15 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17122372 14962176 7 29 0 0 0 0 0 0 0 0 0 13062 27003 >>>>>>> >>>>>>> >>>> 26973 >>>> >>>> >>>>>>> >>>>>> 46 2 52 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17122372 14962176 0 0 0 0 0 0 0 0 0 0 0 461 160 >> 274 >>>>>>> >>>> 0 >>>> >>>> >>>>>>> >>>>>> 0 100 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> As you can see above that I am executing the program with 4 >> concurrent >>>>>>> threads for 10 seconds. The output shows that each thread has >> acquired >>>>>>> the mutex lock about 400,000 times a second. The vmstat shows that >> the >>>>>>> program used about 80% of usertime, 4%system and ~15% idle. So far >> so >>>>>>> good. >>>>>>> >>>>>>> Now, let's look into program with truss case. >>>>>>> >>>>>>> $ truss -fc mutex_ops -w4 -r10 >>>>>>> number of threads:4, run-time (in seconds)=10 >>>>>>> Thread 0: total ops: 13419838, ops/second: 1341983 >>>>>>> Thread 2: total ops: 13959115, ops/second: 1395911 >>>>>>> Thread 1: total ops: 13414291, ops/second: 1341429 >>>>>>> Thread 3: total ops: 13535670, ops/second: 1353567 >>>>>>> >>>>>>> syscall seconds calls errors >>>>>>> _exit .000 1 >>>>>>> write .000 5 >>>>>>> open .000 3 1 >>>>>>> close .000 2 >>>>>>> stat .000 10 7 >>>>>>> getpid .000 1 >>>>>>> fstat .000 1 >>>>>>> ioctl .000 1 >>>>>>> execve .000 1 >>>>>>> lwp_park .905 106423 >>>>>>> lwp_unpark .757 106423 >>>>>>> sigaction .000 1 >>>>>>> getcontext .000 1 >>>>>>> setustack .000 5 >>>>>>> mmap .000 16 >>>>>>> munmap .000 2 >>>>>>> getrlimit .000 1 >>>>>>> memcntl .000 1 >>>>>>> sysconfig .000 2 >>>>>>> yield .182 26004 >>>>>>> lwp_create .000 4 >>>>>>> lwp_exit .000 4 >>>>>>> lwp_continue .000 4 >>>>>>> lwp_sigmask .000 4 >>>>>>> lwp_private .000 1 >>>>>>> lwp_wait .000 4 >>>>>>> lwp_mutex_wakeup .000 9 >>>>>>> nanosleep .000 1 >>>>>>> schedctl .000 5 >>>>>>> resolvepath .000 4 >>>>>>> lwp_mutex_timedlock .000 32 >>>>>>> getcwd .000 1 >>>>>>> -------- ------ ---- >>>>>>> sys totals: 1.846 238977 8 >>>>>>> usr time: 21.363 >>>>>>> elapsed: 10.060 >>>>>>> $ >>>>>>> >>>>>>> <<vmstat 1>> >>>>>>> kthr memory page disk faults >>>>>>> >>>>>>> >>>> cpu >>>> >>>> >>>>>>> r b w swap free re mf pi po fr de sr s0 s1 -- -- in sy >> cs >>>>>>> >>>> us >>>> >>>> >>>>>>> >>>>>> sy id >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17122372 14962176 0 0 0 0 0 0 0 0 0 0 0 529 262 >> 335 >>>>>>> >>>> 0 >>>> >>>> >>>>>>> >>>>>> 0 100 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 0 0 0 17121240 14960660 153 632 0 0 0 0 0 0 0 0 0 25259 91754 >>>>>>> 87992 37 15 48 >>>>>>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 62751 >> 178228 >>>>>>> 179102 57 33 10 >>>>>>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 56922 >> 169565 >>>>>>> 168507 59 30 11 >>>>>>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 60935 >> 178967 >>>>>>> 178535 58 32 10 >>>>>>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 61581 >> 180239 >>>>>>> 179751 58 32 10 >>>>>>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 61486 >> 181664 >>>>>>> 180513 58 33 10 >>>>>>> 1 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 67941 >> 186205 >>>>>>> 190703 55 35 10 >>>>>>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 59981 >> 175708 >>>>>>> 175758 58 32 10 >>>>>>> 1 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 60485 >> 174557 >>>>>>> 175091 58 32 11 >>>>>>> 1 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 62799 >> 178328 >>>>>>> 179880 57 32 10 >>>>>>> 0 0 0 17122236 14962040 8 17 0 0 0 0 0 0 0 0 0 19483 58122 >>>>>>> 58317 19 10 71 >>>>>>> 0 0 0 17122236 14962040 0 0 0 0 0 0 0 0 0 0 0 449 160 >> 279 >>>>>>> >>>> 0 >>>> >>>> >>>>>>> >>>>>> 0 100 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Can you spot the issue? I used the same parameters(4 concurrent >>>>>>> threads, 10 seconds run-time), but this time *with truss enabled*. >> The >>>>>>> output shows that each thread has acquired the mutex lock about >>>>>>> 1,300,000 times a second. vmstat shows that the program is using a >> lot >>>>>>> more system time than before. However, I don't know how truss can >>>>>>> cause the system time to go up, which ultimately increases the >>>>>>> performance. Am I missing something? >>>>>>> >>>>>>> BTW, I confirmed the mutex locks generated by the program using >>>>>>> dtrace. The values are indeed correct for both the cases. >>>>>>> >>>>>>> Any inputs/comments are appreciated. >>>>>>> >>>>>>> thanks. >>>>>>> >>>>>>> senthil >>>>>>> _______________________________________________ >>>>>>> perf-discuss mailing list >>>>>>> perf-discuss@opensolaris.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >> > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org -- Neelakanth Nadgir http://blogs.sun.com/realneel _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org