Ok, this pretty much verifies what I thought. In the short, by running truss you have slowed things down enough in the application to significantly remove the lock contention on the program's mutex. This is one of those performance items that appears to be contradictory on the surface.
A bit more on this now: Your program has a very tight loop, you acquire the lock, increment a variable and drop the lock. There is a whole slew of atomic_* operations that essentially does the same thing, just more efficient (if they can be used depends on the case). By being a tight loop as it is, during the operation of the program without truss it ends up having the threads queuing up on each other in attempt to acquire the lock. When we miss on a lock because it is owned by another thread we end up going into a totally different set of lock routines which is expensive. By running truss what you have done is slowed things down enough to reduce the lock contention being seen on the mutex in your program and the likeliness of entering the "slow" lock path. If you were to add work before or after the mutex operation you will see the same effect, and probably even to a greater degree. Hope this helps. Dave Valin senthil ramanujam wrote: > Hi Dave, > > Here is the plockstat data for both the cases. Let me know if you like > to see more details. > > Looking forward to hear your comments. > > thanks. > > senthil > > > $ plockstat -A mutex_ops -w4 -r10 > number of threads:4, run-time (in seconds)=10 > Thread 2: total ops: 456161, ops/second: 45616 > Thread 0: total ops: 452104, ops/second: 45210 > Thread 1: total ops: 167338, ops/second: 16733 > Thread 3: total ops: 174826, ops/second: 17482 > plockstat: pid 13497 has exited > > Mutex hold > > Count nsec Lock Caller > ------------------------------------------------------------------------------- > 1248577 1639 mutex_ops`worker_mutex mutex_ops`do_run+0x1b2 > 1851 1960 mutex_ops`worker_mutex mutex_ops`do_run+0x1b2 > 3 104789 mutex_ops`_iob+0xa8 mutex_ops`do_run+0x25a > 1 82008 libc.so.1`_uberdata+0xfc0 0x400e16 > 1 42599 libc.so.1`_first_link_lock libc.so.1`_exithandle+0x3b > 1 39967 libc.so.1`_uberdata+0x60 LM1`ld.so.1`rt_thr_init+0x29 > 4 9374 libc.so.1`_uberdata+0x1040 libc.so.1`_lwp_start > 4 8402 libc.so.1`_uberdata+0x40 libc.so.1`pthread_create+0x135 > 1 32844 mutex_ops`_iob+0x28 libc.so.1`_exithandle+0x3b > 3 9736 libc.so.1`_uberdata libc.so.1`_thr_exit_common+0x9c > 4 6343 libc.so.1`_uberdata mutex_ops`main+0x2e7 > 1 25319 mutex_ops`_iob+0xa8 mutex_ops`do_run+0x25a > 4 6162 libc.so.1`_uberdata mutex_ops`main+0x22e > 4 4369 libc.so.1`_uberdata libc.so.1`_thr_exit_common+0x9c > 2 6982 mutex_ops`worker_mutex mutex_ops`do_run+0xea > 6 1524 libc.so.1`_uberdata+0xd40 libc.so.1`exit+0x11 > 1 8422 libc.so.1`_uberdata libc.so.1`_thr_exit_common+0x9c > 4 1934 libc.so.1`_uberdata+0x1040 libc.so.1`_thrp_unwind+0x5e > 1 7646 0xfffffd7fff220100 libc.so.1`_thr_exit_common+0x9c > 2 3517 mutex_ops`worker_mutex mutex_ops`do_run+0xea > 1 6603 libc.so.1`_uberdata+0xd40 libc.so.1`libc_init+0x2c6 > 4 1516 libc.so.1`_uberdata+0x1000 libc.so.1`_thrp_unwind+0x5e > 1 6029 0xfffffd7fff220080 libc.so.1`pthread_create+0x135 > 1 5789 0xfffffd7fff220080 libc.so.1`_thr_exit_common+0x9c > 1 4387 0xfffffd7fff2200c0 libc.so.1`pthread_create+0x135 > 3 1447 libc.so.1`_uberdata+0xe80 libc.so.1`find_stack+0x18a > 3 1396 libc.so.1`_uberdata libc.so.1`_thrp_create+0x531 > 1 4078 0xfffffd7fff220100 libc.so.1`pthread_create+0x135 > 3 1309 libc.so.1`_uberdata libc.so.1`pthread_create+0xfe > 1 3900 0xfffffd7fff220140 libc.so.1`pthread_create+0x135 > 1 3811 0xfffffd7fff2200c0 libc.so.1`_thr_exit_common+0x9c > 1 3436 0xfffffd7fff220140 libc.so.1`_thr_exit_common+0x9c > 1 2827 0xfffffd7fff220080 libc.so.1`pthread_create+0xfe > 2 1367 libc.so.1`_uberdata+0xe40 libc.so.1`_thrp_unwind+0x5e > 1 2369 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 > 1 2341 mutex_ops`worker_mutex mutex_ops`do_run+0x1b2 > 1 2311 libc.so.1`_uberdata+0x1040 LM1`ld.so.1`rt_thr_init+0x29 > 1 2056 0xfffffd7fff220100 mutex_ops`main+0x2e7 > 1 1951 0xfffffd7fff2200c0 mutex_ops`main+0x2e7 > 1 1863 0xfffffd7fff220140 mutex_ops`main+0x2e7 > 1 1803 0xfffffd7fff220140 libc.so.1`pthread_create+0xfe > 1 1794 libc.so.1`_uberdata mutex_ops`main+0x22e > 1 1782 0xfffffd7fff220100 libc.so.1`pthread_create+0xfe > 1 1671 0xfffffd7fff2200c0 libc.so.1`pthread_create+0xfe > 1 1622 mutex_ops`_iob+0xa8 libc.so.1`_exithandle+0x3b > 1 1510 0xfffffd7fff220080 mutex_ops`main+0x2e7 > 1 1508 libc.so.1`_uberdata+0xe80 libc.so.1`_thrp_create+0x531 > 1 1506 libc.so.1`_uberdata+0xd40 0x400dbb > 1 1495 libc.so.1`_uberdata+0xfc0 0x400dbb > 1 1446 libc.so.1`_uberdata+0xfc0 libc.so.1`pthread_create+0xfe > 1 1398 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 > 1 1389 libc.so.1`_uberdata libc.so.1`pthread_create+0xfe > 1 1370 libc.so.1`_uberdata+0xfc0 libc.so.1`pthread_create+0xfe > 1 1347 libc.so.1`_uberdata+0xd40 0x400dc7 > 1 1310 libc.so.1`_uberdata+0xfc0 0x400dc7 > 1 1194 libc.so.1`_uberdata+0xfc0 libc.so.1`pthread_create+0xfe > 1 1177 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 > > Mutex block > > Count nsec Lock Caller > ------------------------------------------------------------------------------- > 866 21921703 mutex_ops`worker_mutex mutex_ops`do_run+0x150 > 692495 8328 mutex_ops`worker_mutex mutex_ops`do_run+0x150 > 2 11086954 mutex_ops`worker_mutex mutex_ops`do_run+0x94 > 3 219879 mutex_ops`_iob+0xa8 libc.so.1`_flockget+0x1a > 1 12013 libc.so.1`_uberdata mutex_ops`main+0x2e7 > 1 2094 libc.so.1`_uberdata libc.so.1`_thr_exit_common+0x9c > > > << trusss + mutex >> > $ plockstat -A truss -fc mutex_ops -w4 -r10 > number of threads:4, run-time (in seconds)=10 > Thread 2: total ops: 17918909, ops/second: 1791890 > Thread 0: total ops: 18518051, ops/second: 1851805 > Thread 1: total ops: 18280090, ops/second: 1828009 > Thread 3: total ops: 18889775, ops/second: 1888977 > > syscall seconds calls errors > _exit .000 1 > write .000 5 > open .000 3 1 > close .000 2 > stat .000 10 7 > getpid .000 1 > fstat .000 1 > ioctl .000 1 > execve .000 1 > lwp_park .634 94191 > lwp_unpark .501 94191 > sigaction .000 1 > getcontext .000 1 > setustack .000 5 > mmap .000 16 > munmap .000 2 > getrlimit .000 1 > memcntl .000 1 > sysconfig .000 2 > yield .082 12845 > lwp_create .000 4 > lwp_exit .000 4 > lwp_continue .000 4 > lwp_sigmask .000 4 > lwp_private .000 1 > lwp_wait .000 4 > lwp_mutex_wakeup .000 10 > nanosleep .000 1 > schedctl .000 5 > resolvepath .000 4 > lwp_mutex_timedlock .000 68 > getcwd .000 1 > -------- ------ ---- > sys totals: 1.219 201391 8 > usr time: 17.809 > elapsed: 10.030 > plockstat: pid 13499 has exited > > Mutex hold > > Count nsec Lock Caller > ------------------------------------------------------------------------------- > 257123 8166 truss`truss_lock truss`worker_thread+0x3a7 > 145651 11250 truss`truss_lock truss`worker_thread+0x3a7 > 201384 1893 truss`count_lock truss`sysexit+0x82d > 1 2468522 libc.so.1`_uberdata+0x60 libproc.so.1`Pxcreate+0x63 > 1 2447046 libnsl.so.1`_ti_userlock > libnsl.so.1`_libnsl_parent_atfork+0x10 > 1 2380540 libc.so.1`_uberdata+0xfc0 libc.so.1`fork+0x130 > 1 2364179 libc.so.1`_first_link_lock libc.so.1`fork+0x130 > 1 2356270 libc.so.1`libc_malloc_lock libc.so.1`fork+0x130 > 1 2337553 libc.so.1`_uberdata+0x40 libproc.so.1`Pxcreate+0x63 > 4 52074 0x455478 libproc.so.1`Lgrab+0x234 > 1 144394 libc.so.1`_uberdata+0xfc0 truss`main_thread+0x1e8 > 4 32702 truss`truss_lock truss`worker_thread+0x5d6 > 5 23212 0x455478 libproc.so.1`Lfree+0x3a > 1 66469 0xfef9e1b0 0xfef06df3 > 1 64668 0xfefa12c0 0xfef67a91 > 32 1858 truss`_iob+0xa8 truss`report+0x223 > 1 58652 0x455478 libproc.so.1`Lgrab+0x234 > 5 11034 libc.so.1`_uberdata libc.so.1`_thr_exit_common+0x9c > 32 1583 truss`_iob+0xa8 truss`prtim+0x51 > 32 1547 truss`_iob+0xa8 truss`report+0x25e > 2 23108 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 32 1412 truss`_iob+0xa8 truss`prtim+0x64 > 1 40708 0xfefa12a0 0xfef021df > 1 40112 libc.so.1`_uberdata+0x60 LM1`ld.so.1`rt_thr_init+0x29 > 1 34837 libc.so.1`_first_link_lock libc.so.1`_exithandle+0x3b > 5 6891 truss`truss_lock truss`main_thread+0x116 > 1 34270 0xfef9e5d0 0xfef14052 > 4 8244 libc.so.1`_uberdata+0x40 truss`worker_thread+0xe52 > 1 31319 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 5 6235 libc.so.1`_uberdata truss`main_thread+0x125 > 4 7557 libc.so.1`libc_malloc_lock libproc.so.1`Lgrab+0xd3 > 5 6010 libc.so.1`_uberdata+0x1040 libc.so.1`_lwp_start > 1 29687 libc.so.1`libc_malloc_lock truss`main+0x4ac > 1 28272 libc.so.1`mktemp_lock.0 truss`main+0xdb4 > 1 26436 truss`truss_lock truss`worker_thread+0x5d6 > 1 25322 truss`_iob+0xa8 libc.so.1`_exithandle+0x3b > 1 25197 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 4 6142 libc.so.1`_uberdata truss`worker_thread+0xe34 > 1 21847 libc.so.1`libc_malloc_lock truss`my_calloc+0x18 > 1 19740 libproc.so.1`minfd_lock.0 libproc.so.1`set_minfd+0x46 > 1 17725 0xfffffd7fff060140 libc.so.1`_thr_exit_common+0x9c > 1 14834 libc.so.1`libc_malloc_lock truss`free_private+0x44 > 5 2865 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 12900 0xfffffd7fff200080 truss`Xcritical+0x46 > 5 2470 libc.so.1`_uberdata libc.so.1`_thr_exit_common+0x9c > 1 12239 libc.so.1`__sbrk_lock libc.so.1`_malloc_unlocked+0x10d > 1 12155 0xfef9e1f0 0xfef0703b > 1 12062 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 6 2004 truss`count_lock truss`worker_thread+0x108a > 1 11918 libc.so.1`_uberdata+0x40 truss`main_thread+0x1ed > 5 2347 truss`count_lock truss`worker_thread+0xb09 > 1 11608 0xfef9f708 0xfef26ac9 > 5 2215 libc.so.1`libc_malloc_lock libproc.so.1`Lfree_internal+0x67 > 1 10685 libc.so.1`update_lock libctf.so.1`_libctf_init+0x10 > 1 10092 libc.so.1`_uberdata libproc.so.1`Pxcreate+0x63 > 5 2016 libc.so.1`_uberdata+0x1040 libc.so.1`_thrp_unwind+0x5e > 1 9656 0x455478 libproc.so.1`Pfree+0x117 > 2 4810 libc.so.1`__sbrk_lock libc.so.1`realloc+0x235 > 5 1871 0xfffffd7fff200080 truss`Xcritical+0x46 > 6 1543 libc.so.1`_uberdata+0xd40 libc.so.1`exit+0x11 > 5 1744 libc.so.1`libc_malloc_lock truss`free_private+0x55 > 5 1738 libc.so.1`_uberdata+0x1000 libc.so.1`_thrp_unwind+0x5e > 5 1688 libc.so.1`libc_malloc_lock truss`free_private+0x66 > 1 8407 truss`count_lock truss`setup_basetime+0x60 > 1 8252 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 2 4044 libc.so.1`__sbrk_lock libc.so.1`malloc+0x2d > 1 8072 libc.so.1`libc_malloc_lock libproc.so.1`Lgrab+0x72 > 5 1612 libc.so.1`_uberdata+0x1000 libc.so.1`_thrp_unwind+0x5e > 1 8046 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 5 1543 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 2 3818 libc.so.1`__sbrk_lock libc.so.1`realloc+0x235 > 4 1897 libc.so.1`_uberdata libc.so.1`thr_create+0x3f > 1 7516 0xfef9e5d0 0xfef14052 > 5 1497 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 7375 0xfffffd7fff060080 truss`main_thread+0x1ed > 1 7372 libc.so.1`_uberdata truss`create_thread+0x58 > 1 6977 libc.so.1`_uberdata+0xd40 libc.so.1`libc_init+0x2c6 > 1 6664 0xfefa1f80 0xfef63d56 > 4 1663 libc.so.1`_uberdata+0xe80 libc.so.1`find_stack+0x18a > 1 6569 0xfffffd7fff060180 libc.so.1`_thr_exit_common+0x9c > 2 3186 libc.so.1`__sbrk_lock libc.so.1`malloc+0x2d > 1 6260 libc.so.1`_uberdata+0x1000 truss`main+0xf8 > 1 5847 libc.so.1`libc_malloc_lock libproc.so.1`Lgrab+0xd3 > 1 5799 libc.so.1`_uberdata+0x60 libnsl.so.1`_libnsl_lock_init+0x86 > 1 5664 0xfffffd7fff060140 truss`worker_thread+0xe52 > 1 5502 0xfefa2240 0xfef734b2 > 4 1338 libc.so.1`_uberdata libc.so.1`_thrp_create+0x531 > 1 5248 0xfffffd7fff0600c0 truss`worker_thread+0xe52 > 1 5204 libc.so.1`__sbrk_lock libc.so.1`_malloc_unlocked+0x10d > 1 5141 truss`count_lock truss`per_proc_init+0xe1 > 1 5105 0xfefa10e0 0xfef62237 > 1 4904 0xfefa10e0 0xfef62237 > 3 1613 libc.so.1`libc_malloc_lock truss`free_private+0x44 > 1 4403 libc.so.1`_uberdata+0x140 truss`main+0xbeb > 1 4245 0xfffffd7fff060100 truss`worker_thread+0xe52 > 1 3702 0xfffffd7fff0600c0 libc.so.1`_thr_exit_common+0x9c > 1 3469 truss`_iob+0xa8 truss`report+0x418 > 2 1702 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 3396 0xfffffd7fff060100 libc.so.1`_thr_exit_common+0x9c > 2 1641 libc.so.1`_uberdata+0xe40 libc.so.1`_thrp_unwind+0x5e > 2 1514 truss`_iob+0xa8 truss`report+0x366 > 2 1477 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 2 1369 libc.so.1`libc_malloc_lock truss`sysentry+0x4e7 > 1 2653 0xfffffd7fff060080 libc.so.1`thr_create+0x3f > 1 2646 libc.so.1`_uberdata+0x140 truss`main+0xbff > 1 2528 truss`truss_lock truss`main_thread+0x116 > 1 2470 libc.so.1`_uberdata+0x580 truss`main+0x554 > 1 2452 0xfffffd7fff200080 truss`Xcritical+0x46 > 1 2391 0xfffffd7fff060080 libc.so.1`_thr_exit_common+0x9c > 1 2223 libc.so.1`_uberdata+0x440 truss`main+0x545 > 1 2219 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 2203 libc.so.1`_uberdata+0x4c0 truss`main+0x518 > 1 2187 libc.so.1`_uberdata+0x500 truss`main+0x527 > 1 2185 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 > 1 2182 0xfefa2280 0xfef73271 > 1 2127 0xfffffd7fff060080 truss`main_thread+0x125 > 1 2113 0xfffffd7fff060180 truss`worker_thread+0xe52 > 1 2090 libc.so.1`_uberdata+0x1040 LM1`ld.so.1`rt_thr_init+0x29 > 1 2082 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 2072 libc.so.1`_uberdata+0x540 truss`main+0x536 > 1 2057 libc.so.1`_uberdata+0x180 truss`main+0xc0e > 1 2041 libc.so.1`_uberdata+0x1c0 truss`main+0xc31 > 1 2035 0xfffffd7fff060140 truss`main_thread+0x125 > 1 2013 0xfffffd7fff060140 libc.so.1`thr_create+0x3f > 1 1968 libc.so.1`libc_malloc_lock truss`main+0xdf1 > 1 1961 0xfefa2000 0xfef63d56 > 1 1954 libc.so.1`_uberdata libproc.so.1`Pxcreate+0x63 > 1 1927 0xfffffd7fff0600c0 truss`main_thread+0x125 > 1 1923 libc.so.1`_uberdata+0x180 truss`main+0xc22 > 1 1918 0xfffffd7fff060100 libc.so.1`thr_create+0x3f > 1 1917 truss`_iob+0xa8 truss`prtim+0x20 > 1 1891 0xfffffd7fff060180 truss`main_thread+0x125 > 1 1890 libc.so.1`_uberdata+0x1c0 truss`main+0x507 > 1 1884 libc.so.1`seed_lk truss`main+0xdb4 > 1 1877 libc.so.1`libc_malloc_lock libproc.so.1`Pfree+0x10f > 1 1875 libc.so.1`libc_malloc_lock libproc.so.1`Pxcreate+0x52 > 1 1867 libc.so.1`libc_malloc_lock truss`main+0xdb4 > 1 1832 0xfffffd7fff0600c0 libc.so.1`thr_create+0x3f > 1 1830 0xfffffd7fff060100 truss`main_thread+0x125 > 1 1770 libc.so.1`_uberdata+0xfc0 libc.so.1`thr_create+0x3f > 1 1765 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 1754 truss`_iob+0x28 libc.so.1`_exithandle+0x3b > 1 1732 libc.so.1`_uberdata+0xd40 truss`main+0xf8 > 1 1723 libc.so.1`_uberdata+0xd40 truss`_start+0x1b > 1 1695 truss`_iob+0xa8 truss`report+0x4af > 1 1681 libc.so.1`_uberdata+0xd40 libnsl.so.1`_libnsl_lock_init+0x86 > 1 1674 libc.so.1`_uberdata+0xe80 libc.so.1`_thrp_create+0x531 > 1 1649 0xfefa1f80 0xfef63d56 > 1 1648 truss`_iob+0xa8 truss`report+0x492 > 1 1647 0xfef9e5d0 0xfef14052 > 1 1642 0xfffffd7fff200080 truss`Xcritical+0x46 > 1 1640 truss`_iob+0xa8 truss`report+0x486 > 1 1637 truss`_iob+0xa8 truss`prtim+0x20 > 1 1624 0xfffffd7fff060180 libc.so.1`thr_create+0x3f > 1 1616 0xfefa2200 0xfef01d17 > 1 1608 libc.so.1`_uberdata truss`create_thread+0x58 > 1 1608 truss`_iob+0xa8 truss`prtim+0x51 > 1 1547 truss`_iob+0xa8 truss`report+0x53e > 1 1547 truss`_iob+0xa8 truss`prtim+0x51 > 1 1533 0xfefa2040 0xfef63d56 > 1 1518 0xfefa1f80 0xfef63d56 > 1 1505 libc.so.1`libc_malloc_lock libproc.so.1`Pfree+0x1cf > 1 1488 0xfefa1f80 0xfef63d56 > 1 1466 truss`_iob+0xa8 truss`prtim+0x51 > 1 1445 libc.so.1`libc_malloc_lock truss`free_private+0x88 > 1 1443 libc.so.1`libc_malloc_lock truss`my_realloc+0x18 > 1 1426 truss`_iob+0xa8 truss`report+0x4cb > 1 1424 truss`_iob+0xa8 truss`prtim+0x20 > 1 1415 0xfef9e5d0 0xfef14052 > 1 1404 libc.so.1`_uberdata+0xfc0 libc.so.1`thr_create+0x3f > 1 1397 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 > 1 1378 libc.so.1`_uberdata+0xd40 libc.so.1`_thrp_create+0x370 > 1 1372 libc.so.1`_uberdata libc.so.1`thr_create+0x3f > 1 1363 libc.so.1`libc_malloc_lock truss`free_private+0x90 > 1 1345 libc.so.1`_uberdata+0xd40 truss`_start+0x27 > 1 1343 0xfefa1f80 0xfef63d56 > 1 1341 libc.so.1`_uberdata+0xfc0 truss`_start+0x1b > 1 1340 0xfefa2200 0xfef01d17 > 1 1324 0xfefa1f80 0xfef63d56 > 1 1284 libc.so.1`_uberdata+0xfc0 libc.so.1`thr_create+0x3f > 1 1142 libc.so.1`_uberdata+0xfc0 truss`_start+0x27 > > Mutex block > > Count nsec Lock Caller > ------------------------------------------------------------------------------- > 124379 34360 truss`truss_lock truss`worker_thread+0x27d > 88153 2193 truss`truss_lock truss`worker_thread+0x27d > 3 55924 truss`truss_lock truss`worker_thread+0x5c4 > 1 12080 libc.so.1`libc_malloc_lock libc.so.1`free+0x18 > 1 2286 truss`truss_lock truss`main_thread+0xed > $ > > On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > >> Could you please provide lockstat information from plockstat for each of >> these >> cases. I think I know what you are seeing, but I want data to back up the >> theory. >> >> Thanks >> >> Dave Valin >> >> >> senthil ramanujam wrote: >> >> >>> Hi, >>> >>> I am seeing a strange performance behavior with pthread mutex >>> operations on my Solaris (4-core) system running S10U5. >>> >>> The issue is that if the program is run with truss enabled, I am >>> getting about 3 times more the performance of the run without truss >>> enabled. Strange...right? >>> >>> The C program does the following: >>> - accept 2 parameters [number of concurrent threads(N), and run-time] >>> - create and run N threads simultaneously for <run-time> seconds >>> - each thread does the following: >>> - acquires the pthread mutex >>> - increases the value of it's _own_ long variable by 1 >>> - releases the pthread mutex >>> - after run-time expires, each thread reports the value of it's long >>> >> variable >> >>> Note that I did not use any special flags to compile it. Reply back to >>> this email if you like to take a look at the program source code. >>> >>> Let's look at the program without truss enabled. >>> >>> $ mutex_ops -w4 -r10 >>> number of threads:4, run-time (in seconds)=10 >>> Thread 1: total ops: 4580775, ops/second: 458077 >>> Thread 3: total ops: 3798001, ops/second: 379800 >>> Thread 0: total ops: 4867850, ops/second: 486785 >>> Thread 2: total ops: 4925299, ops/second: 492529 >>> $ >>> <<vmstat 1>> >>> kthr memory page disk faults cpu >>> r b w swap free re mf pi po fr de sr s0 s1 -- -- in sy cs us >>> >> sy id >> >>> 0 0 0 17121968 14961748 22 205 0 0 0 0 0 0 0 0 0 7960 16218 16155 28 >>> >> 2 71 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19914 41572 41568 >>> >> 81 3 16 >> >>> 1 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 18711 39086 38937 >>> >> 80 3 17 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19006 39373 39382 >>> >> 80 3 17 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 18719 39151 39007 >>> >> 80 3 16 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19560 41243 40963 >>> >> 81 4 16 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 20498 43671 43203 >>> >> 81 4 16 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 20857 44636 44014 >>> >> 80 4 15 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 19149 41239 40583 >>> >> 80 4 16 >> >>> 0 0 0 17121968 14961748 0 0 0 0 0 0 0 0 0 0 0 21019 44189 43995 >>> >> 81 4 15 >> >>> 0 0 0 17122372 14962176 7 29 0 0 0 0 0 0 0 0 0 13062 27003 26973 >>> >> 46 2 52 >> >>> 0 0 0 17122372 14962176 0 0 0 0 0 0 0 0 0 0 0 461 160 274 0 >>> >> 0 100 >> >>> As you can see above that I am executing the program with 4 concurrent >>> threads for 10 seconds. The output shows that each thread has acquired >>> the mutex lock about 400,000 times a second. The vmstat shows that the >>> program used about 80% of usertime, 4%system and ~15% idle. So far so >>> good. >>> >>> Now, let's look into program with truss case. >>> >>> $ truss -fc mutex_ops -w4 -r10 >>> number of threads:4, run-time (in seconds)=10 >>> Thread 0: total ops: 13419838, ops/second: 1341983 >>> Thread 2: total ops: 13959115, ops/second: 1395911 >>> Thread 1: total ops: 13414291, ops/second: 1341429 >>> Thread 3: total ops: 13535670, ops/second: 1353567 >>> >>> syscall seconds calls errors >>> _exit .000 1 >>> write .000 5 >>> open .000 3 1 >>> close .000 2 >>> stat .000 10 7 >>> getpid .000 1 >>> fstat .000 1 >>> ioctl .000 1 >>> execve .000 1 >>> lwp_park .905 106423 >>> lwp_unpark .757 106423 >>> sigaction .000 1 >>> getcontext .000 1 >>> setustack .000 5 >>> mmap .000 16 >>> munmap .000 2 >>> getrlimit .000 1 >>> memcntl .000 1 >>> sysconfig .000 2 >>> yield .182 26004 >>> lwp_create .000 4 >>> lwp_exit .000 4 >>> lwp_continue .000 4 >>> lwp_sigmask .000 4 >>> lwp_private .000 1 >>> lwp_wait .000 4 >>> lwp_mutex_wakeup .000 9 >>> nanosleep .000 1 >>> schedctl .000 5 >>> resolvepath .000 4 >>> lwp_mutex_timedlock .000 32 >>> getcwd .000 1 >>> -------- ------ ---- >>> sys totals: 1.846 238977 8 >>> usr time: 21.363 >>> elapsed: 10.060 >>> $ >>> >>> <<vmstat 1>> >>> kthr memory page disk faults cpu >>> r b w swap free re mf pi po fr de sr s0 s1 -- -- in sy cs us >>> >> sy id >> >>> 0 0 0 17122372 14962176 0 0 0 0 0 0 0 0 0 0 0 529 262 335 0 >>> >> 0 100 >> >>> 0 0 0 17121240 14960660 153 632 0 0 0 0 0 0 0 0 0 25259 91754 >>> 87992 37 15 48 >>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 62751 178228 >>> 179102 57 33 10 >>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 56922 169565 >>> 168507 59 30 11 >>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 60935 178967 >>> 178535 58 32 10 >>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 61581 180239 >>> 179751 58 32 10 >>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 61486 181664 >>> 180513 58 33 10 >>> 1 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 67941 186205 >>> 190703 55 35 10 >>> 0 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 59981 175708 >>> 175758 58 32 10 >>> 1 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 60485 174557 >>> 175091 58 32 11 >>> 1 0 0 17121104 14960524 0 0 0 0 0 0 0 0 0 0 0 62799 178328 >>> 179880 57 32 10 >>> 0 0 0 17122236 14962040 8 17 0 0 0 0 0 0 0 0 0 19483 58122 >>> 58317 19 10 71 >>> 0 0 0 17122236 14962040 0 0 0 0 0 0 0 0 0 0 0 449 160 279 0 >>> >> 0 100 >> >>> Can you spot the issue? I used the same parameters(4 concurrent >>> threads, 10 seconds run-time), but this time *with truss enabled*. The >>> output shows that each thread has acquired the mutex lock about >>> 1,300,000 times a second. vmstat shows that the program is using a lot >>> more system time than before. However, I don't know how truss can >>> cause the system time to go up, which ultimately increases the >>> performance. Am I missing something? >>> >>> BTW, I confirmed the mutex locks generated by the program using >>> dtrace. The values are indeed correct for both the cases. >>> >>> Any inputs/comments are appreciated. >>> >>> thanks. >>> >>> senthil >>> _______________________________________________ >>> perf-discuss mailing list >>> perf-discuss@opensolaris.org >>> >>> >>> >> _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org