Ok, this pretty much verifies what I thought.  In the short, by running 
truss you have
slowed things down enough in the application to significantly remove the 
lock contention
on the program's mutex.  This is one of those performance items that 
appears to be contradictory
on the surface.

A bit more on this now:

Your program has a very tight loop, you acquire the lock, increment a 
variable and drop the
lock.  There is a whole slew of atomic_* operations that essentially 
does the same thing, just
more efficient (if they can be used depends on the case). 

By being a tight loop as it is, during the operation of the program 
without truss  it ends up having
the threads queuing up on each other in attempt to acquire the lock.  
When we miss on a lock because
it is owned by another thread we end up going into a totally different  
set of lock routines which is
expensive.  By running truss what you have done is slowed things down 
enough to reduce the
lock contention being seen on the mutex in your program and the 
likeliness of entering the "slow" lock
path.  If you were to add work before or after the mutex operation you 
will see the same effect, and probably
even to a greater degree. 

Hope this helps.

Dave Valin

senthil ramanujam wrote:
> Hi Dave,
>
> Here is the plockstat data for both the cases. Let me know if you like
> to see more details.
>
> Looking forward to hear your comments.
>
> thanks.
>
> senthil
>
>
> $ plockstat -A mutex_ops -w4 -r10
>    number of threads:4,   run-time (in seconds)=10
>    Thread    2:   total ops: 456161,   ops/second:    45616
>    Thread    0:   total ops: 452104,   ops/second:    45210
>    Thread    1:   total ops: 167338,   ops/second:    16733
>    Thread    3:   total ops: 174826,   ops/second:    17482
> plockstat: pid 13497 has exited
>
> Mutex hold
>
> Count     nsec Lock                         Caller
> -------------------------------------------------------------------------------
> 1248577     1639 mutex_ops`worker_mutex       mutex_ops`do_run+0x1b2
>  1851     1960 mutex_ops`worker_mutex       mutex_ops`do_run+0x1b2
>     3   104789 mutex_ops`_iob+0xa8          mutex_ops`do_run+0x25a
>     1    82008 libc.so.1`_uberdata+0xfc0    0x400e16
>     1    42599 libc.so.1`_first_link_lock   libc.so.1`_exithandle+0x3b
>     1    39967 libc.so.1`_uberdata+0x60     LM1`ld.so.1`rt_thr_init+0x29
>     4     9374 libc.so.1`_uberdata+0x1040   libc.so.1`_lwp_start
>     4     8402 libc.so.1`_uberdata+0x40     libc.so.1`pthread_create+0x135
>     1    32844 mutex_ops`_iob+0x28          libc.so.1`_exithandle+0x3b
>     3     9736 libc.so.1`_uberdata          libc.so.1`_thr_exit_common+0x9c
>     4     6343 libc.so.1`_uberdata          mutex_ops`main+0x2e7
>     1    25319 mutex_ops`_iob+0xa8          mutex_ops`do_run+0x25a
>     4     6162 libc.so.1`_uberdata          mutex_ops`main+0x22e
>     4     4369 libc.so.1`_uberdata          libc.so.1`_thr_exit_common+0x9c
>     2     6982 mutex_ops`worker_mutex       mutex_ops`do_run+0xea
>     6     1524 libc.so.1`_uberdata+0xd40    libc.so.1`exit+0x11
>     1     8422 libc.so.1`_uberdata          libc.so.1`_thr_exit_common+0x9c
>     4     1934 libc.so.1`_uberdata+0x1040   libc.so.1`_thrp_unwind+0x5e
>     1     7646 0xfffffd7fff220100           libc.so.1`_thr_exit_common+0x9c
>     2     3517 mutex_ops`worker_mutex       mutex_ops`do_run+0xea
>     1     6603 libc.so.1`_uberdata+0xd40    libc.so.1`libc_init+0x2c6
>     4     1516 libc.so.1`_uberdata+0x1000   libc.so.1`_thrp_unwind+0x5e
>     1     6029 0xfffffd7fff220080           libc.so.1`pthread_create+0x135
>     1     5789 0xfffffd7fff220080           libc.so.1`_thr_exit_common+0x9c
>     1     4387 0xfffffd7fff2200c0           libc.so.1`pthread_create+0x135
>     3     1447 libc.so.1`_uberdata+0xe80    libc.so.1`find_stack+0x18a
>     3     1396 libc.so.1`_uberdata          libc.so.1`_thrp_create+0x531
>     1     4078 0xfffffd7fff220100           libc.so.1`pthread_create+0x135
>     3     1309 libc.so.1`_uberdata          libc.so.1`pthread_create+0xfe
>     1     3900 0xfffffd7fff220140           libc.so.1`pthread_create+0x135
>     1     3811 0xfffffd7fff2200c0           libc.so.1`_thr_exit_common+0x9c
>     1     3436 0xfffffd7fff220140           libc.so.1`_thr_exit_common+0x9c
>     1     2827 0xfffffd7fff220080           libc.so.1`pthread_create+0xfe
>     2     1367 libc.so.1`_uberdata+0xe40    libc.so.1`_thrp_unwind+0x5e
>     1     2369 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>     1     2341 mutex_ops`worker_mutex       mutex_ops`do_run+0x1b2
>     1     2311 libc.so.1`_uberdata+0x1040   LM1`ld.so.1`rt_thr_init+0x29
>     1     2056 0xfffffd7fff220100           mutex_ops`main+0x2e7
>     1     1951 0xfffffd7fff2200c0           mutex_ops`main+0x2e7
>     1     1863 0xfffffd7fff220140           mutex_ops`main+0x2e7
>     1     1803 0xfffffd7fff220140           libc.so.1`pthread_create+0xfe
>     1     1794 libc.so.1`_uberdata          mutex_ops`main+0x22e
>     1     1782 0xfffffd7fff220100           libc.so.1`pthread_create+0xfe
>     1     1671 0xfffffd7fff2200c0           libc.so.1`pthread_create+0xfe
>     1     1622 mutex_ops`_iob+0xa8          libc.so.1`_exithandle+0x3b
>     1     1510 0xfffffd7fff220080           mutex_ops`main+0x2e7
>     1     1508 libc.so.1`_uberdata+0xe80    libc.so.1`_thrp_create+0x531
>     1     1506 libc.so.1`_uberdata+0xd40    0x400dbb
>     1     1495 libc.so.1`_uberdata+0xfc0    0x400dbb
>     1     1446 libc.so.1`_uberdata+0xfc0    libc.so.1`pthread_create+0xfe
>     1     1398 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>     1     1389 libc.so.1`_uberdata          libc.so.1`pthread_create+0xfe
>     1     1370 libc.so.1`_uberdata+0xfc0    libc.so.1`pthread_create+0xfe
>     1     1347 libc.so.1`_uberdata+0xd40    0x400dc7
>     1     1310 libc.so.1`_uberdata+0xfc0    0x400dc7
>     1     1194 libc.so.1`_uberdata+0xfc0    libc.so.1`pthread_create+0xfe
>     1     1177 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>
> Mutex block
>
> Count     nsec Lock                         Caller
> -------------------------------------------------------------------------------
>   866 21921703 mutex_ops`worker_mutex       mutex_ops`do_run+0x150
> 692495     8328 mutex_ops`worker_mutex       mutex_ops`do_run+0x150
>     2 11086954 mutex_ops`worker_mutex       mutex_ops`do_run+0x94
>     3   219879 mutex_ops`_iob+0xa8          libc.so.1`_flockget+0x1a
>     1    12013 libc.so.1`_uberdata          mutex_ops`main+0x2e7
>     1     2094 libc.so.1`_uberdata          libc.so.1`_thr_exit_common+0x9c
>
>
> << trusss + mutex >>
> $ plockstat -A truss -fc mutex_ops -w4 -r10
>    number of threads:4,   run-time (in seconds)=10
>    Thread    2:   total ops: 17918909,   ops/second:  1791890
>    Thread    0:   total ops: 18518051,   ops/second:  1851805
>    Thread    1:   total ops: 18280090,   ops/second:  1828009
>    Thread    3:   total ops: 18889775,   ops/second:  1888977
>
> syscall               seconds   calls  errors
> _exit                    .000       1
> write                    .000       5
> open                     .000       3       1
> close                    .000       2
> stat                     .000      10       7
> getpid                   .000       1
> fstat                    .000       1
> ioctl                    .000       1
> execve                   .000       1
> lwp_park                 .634   94191
> lwp_unpark               .501   94191
> sigaction                .000       1
> getcontext               .000       1
> setustack                .000       5
> mmap                     .000      16
> munmap                   .000       2
> getrlimit                .000       1
> memcntl                  .000       1
> sysconfig                .000       2
> yield                    .082   12845
> lwp_create               .000       4
> lwp_exit                 .000       4
> lwp_continue             .000       4
> lwp_sigmask              .000       4
> lwp_private              .000       1
> lwp_wait                 .000       4
> lwp_mutex_wakeup         .000      10
> nanosleep                .000       1
> schedctl                 .000       5
> resolvepath              .000       4
> lwp_mutex_timedlock      .000      68
> getcwd                   .000       1
>                      --------  ------   ----
> sys totals:             1.219  201391      8
> usr time:              17.809
> elapsed:               10.030
> plockstat: pid 13499 has exited
>
> Mutex hold
>
> Count     nsec Lock                         Caller
> -------------------------------------------------------------------------------
> 257123     8166 truss`truss_lock             truss`worker_thread+0x3a7
> 145651    11250 truss`truss_lock             truss`worker_thread+0x3a7
> 201384     1893 truss`count_lock             truss`sysexit+0x82d
>     1  2468522 libc.so.1`_uberdata+0x60     libproc.so.1`Pxcreate+0x63
>     1  2447046 libnsl.so.1`_ti_userlock
> libnsl.so.1`_libnsl_parent_atfork+0x10
>     1  2380540 libc.so.1`_uberdata+0xfc0    libc.so.1`fork+0x130
>     1  2364179 libc.so.1`_first_link_lock   libc.so.1`fork+0x130
>     1  2356270 libc.so.1`libc_malloc_lock   libc.so.1`fork+0x130
>     1  2337553 libc.so.1`_uberdata+0x40     libproc.so.1`Pxcreate+0x63
>     4    52074 0x455478                     libproc.so.1`Lgrab+0x234
>     1   144394 libc.so.1`_uberdata+0xfc0    truss`main_thread+0x1e8
>     4    32702 truss`truss_lock             truss`worker_thread+0x5d6
>     5    23212 0x455478                     libproc.so.1`Lfree+0x3a
>     1    66469 0xfef9e1b0                   0xfef06df3
>     1    64668 0xfefa12c0                   0xfef67a91
>    32     1858 truss`_iob+0xa8              truss`report+0x223
>     1    58652 0x455478                     libproc.so.1`Lgrab+0x234
>     5    11034 libc.so.1`_uberdata          libc.so.1`_thr_exit_common+0x9c
>    32     1583 truss`_iob+0xa8              truss`prtim+0x51
>    32     1547 truss`_iob+0xa8              truss`report+0x25e
>     2    23108 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>    32     1412 truss`_iob+0xa8              truss`prtim+0x64
>     1    40708 0xfefa12a0                   0xfef021df
>     1    40112 libc.so.1`_uberdata+0x60     LM1`ld.so.1`rt_thr_init+0x29
>     1    34837 libc.so.1`_first_link_lock   libc.so.1`_exithandle+0x3b
>     5     6891 truss`truss_lock             truss`main_thread+0x116
>     1    34270 0xfef9e5d0                   0xfef14052
>     4     8244 libc.so.1`_uberdata+0x40     truss`worker_thread+0xe52
>     1    31319 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     5     6235 libc.so.1`_uberdata          truss`main_thread+0x125
>     4     7557 libc.so.1`libc_malloc_lock   libproc.so.1`Lgrab+0xd3
>     5     6010 libc.so.1`_uberdata+0x1040   libc.so.1`_lwp_start
>     1    29687 libc.so.1`libc_malloc_lock   truss`main+0x4ac
>     1    28272 libc.so.1`mktemp_lock.0      truss`main+0xdb4
>     1    26436 truss`truss_lock             truss`worker_thread+0x5d6
>     1    25322 truss`_iob+0xa8              libc.so.1`_exithandle+0x3b
>     1    25197 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     4     6142 libc.so.1`_uberdata          truss`worker_thread+0xe34
>     1    21847 libc.so.1`libc_malloc_lock   truss`my_calloc+0x18
>     1    19740 libproc.so.1`minfd_lock.0    libproc.so.1`set_minfd+0x46
>     1    17725 0xfffffd7fff060140           libc.so.1`_thr_exit_common+0x9c
>     1    14834 libc.so.1`libc_malloc_lock   truss`free_private+0x44
>     5     2865 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1    12900 0xfffffd7fff200080           truss`Xcritical+0x46
>     5     2470 libc.so.1`_uberdata          libc.so.1`_thr_exit_common+0x9c
>     1    12239 libc.so.1`__sbrk_lock        libc.so.1`_malloc_unlocked+0x10d
>     1    12155 0xfef9e1f0                   0xfef0703b
>     1    12062 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     6     2004 truss`count_lock             truss`worker_thread+0x108a
>     1    11918 libc.so.1`_uberdata+0x40     truss`main_thread+0x1ed
>     5     2347 truss`count_lock             truss`worker_thread+0xb09
>     1    11608 0xfef9f708                   0xfef26ac9
>     5     2215 libc.so.1`libc_malloc_lock   libproc.so.1`Lfree_internal+0x67
>     1    10685 libc.so.1`update_lock        libctf.so.1`_libctf_init+0x10
>     1    10092 libc.so.1`_uberdata          libproc.so.1`Pxcreate+0x63
>     5     2016 libc.so.1`_uberdata+0x1040   libc.so.1`_thrp_unwind+0x5e
>     1     9656 0x455478                     libproc.so.1`Pfree+0x117
>     2     4810 libc.so.1`__sbrk_lock        libc.so.1`realloc+0x235
>     5     1871 0xfffffd7fff200080           truss`Xcritical+0x46
>     6     1543 libc.so.1`_uberdata+0xd40    libc.so.1`exit+0x11
>     5     1744 libc.so.1`libc_malloc_lock   truss`free_private+0x55
>     5     1738 libc.so.1`_uberdata+0x1000   libc.so.1`_thrp_unwind+0x5e
>     5     1688 libc.so.1`libc_malloc_lock   truss`free_private+0x66
>     1     8407 truss`count_lock             truss`setup_basetime+0x60
>     1     8252 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     2     4044 libc.so.1`__sbrk_lock        libc.so.1`malloc+0x2d
>     1     8072 libc.so.1`libc_malloc_lock   libproc.so.1`Lgrab+0x72
>     5     1612 libc.so.1`_uberdata+0x1000   libc.so.1`_thrp_unwind+0x5e
>     1     8046 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     5     1543 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     2     3818 libc.so.1`__sbrk_lock        libc.so.1`realloc+0x235
>     4     1897 libc.so.1`_uberdata          libc.so.1`thr_create+0x3f
>     1     7516 0xfef9e5d0                   0xfef14052
>     5     1497 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1     7375 0xfffffd7fff060080           truss`main_thread+0x1ed
>     1     7372 libc.so.1`_uberdata          truss`create_thread+0x58
>     1     6977 libc.so.1`_uberdata+0xd40    libc.so.1`libc_init+0x2c6
>     1     6664 0xfefa1f80                   0xfef63d56
>     4     1663 libc.so.1`_uberdata+0xe80    libc.so.1`find_stack+0x18a
>     1     6569 0xfffffd7fff060180           libc.so.1`_thr_exit_common+0x9c
>     2     3186 libc.so.1`__sbrk_lock        libc.so.1`malloc+0x2d
>     1     6260 libc.so.1`_uberdata+0x1000   truss`main+0xf8
>     1     5847 libc.so.1`libc_malloc_lock   libproc.so.1`Lgrab+0xd3
>     1     5799 libc.so.1`_uberdata+0x60     libnsl.so.1`_libnsl_lock_init+0x86
>     1     5664 0xfffffd7fff060140           truss`worker_thread+0xe52
>     1     5502 0xfefa2240                   0xfef734b2
>     4     1338 libc.so.1`_uberdata          libc.so.1`_thrp_create+0x531
>     1     5248 0xfffffd7fff0600c0           truss`worker_thread+0xe52
>     1     5204 libc.so.1`__sbrk_lock        libc.so.1`_malloc_unlocked+0x10d
>     1     5141 truss`count_lock             truss`per_proc_init+0xe1
>     1     5105 0xfefa10e0                   0xfef62237
>     1     4904 0xfefa10e0                   0xfef62237
>     3     1613 libc.so.1`libc_malloc_lock   truss`free_private+0x44
>     1     4403 libc.so.1`_uberdata+0x140    truss`main+0xbeb
>     1     4245 0xfffffd7fff060100           truss`worker_thread+0xe52
>     1     3702 0xfffffd7fff0600c0           libc.so.1`_thr_exit_common+0x9c
>     1     3469 truss`_iob+0xa8              truss`report+0x418
>     2     1702 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1     3396 0xfffffd7fff060100           libc.so.1`_thr_exit_common+0x9c
>     2     1641 libc.so.1`_uberdata+0xe40    libc.so.1`_thrp_unwind+0x5e
>     2     1514 truss`_iob+0xa8              truss`report+0x366
>     2     1477 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     2     1369 libc.so.1`libc_malloc_lock   truss`sysentry+0x4e7
>     1     2653 0xfffffd7fff060080           libc.so.1`thr_create+0x3f
>     1     2646 libc.so.1`_uberdata+0x140    truss`main+0xbff
>     1     2528 truss`truss_lock             truss`main_thread+0x116
>     1     2470 libc.so.1`_uberdata+0x580    truss`main+0x554
>     1     2452 0xfffffd7fff200080           truss`Xcritical+0x46
>     1     2391 0xfffffd7fff060080           libc.so.1`_thr_exit_common+0x9c
>     1     2223 libc.so.1`_uberdata+0x440    truss`main+0x545
>     1     2219 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1     2203 libc.so.1`_uberdata+0x4c0    truss`main+0x518
>     1     2187 libc.so.1`_uberdata+0x500    truss`main+0x527
>     1     2185 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>     1     2182 0xfefa2280                   0xfef73271
>     1     2127 0xfffffd7fff060080           truss`main_thread+0x125
>     1     2113 0xfffffd7fff060180           truss`worker_thread+0xe52
>     1     2090 libc.so.1`_uberdata+0x1040   LM1`ld.so.1`rt_thr_init+0x29
>     1     2082 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1     2072 libc.so.1`_uberdata+0x540    truss`main+0x536
>     1     2057 libc.so.1`_uberdata+0x180    truss`main+0xc0e
>     1     2041 libc.so.1`_uberdata+0x1c0    truss`main+0xc31
>     1     2035 0xfffffd7fff060140           truss`main_thread+0x125
>     1     2013 0xfffffd7fff060140           libc.so.1`thr_create+0x3f
>     1     1968 libc.so.1`libc_malloc_lock   truss`main+0xdf1
>     1     1961 0xfefa2000                   0xfef63d56
>     1     1954 libc.so.1`_uberdata          libproc.so.1`Pxcreate+0x63
>     1     1927 0xfffffd7fff0600c0           truss`main_thread+0x125
>     1     1923 libc.so.1`_uberdata+0x180    truss`main+0xc22
>     1     1918 0xfffffd7fff060100           libc.so.1`thr_create+0x3f
>     1     1917 truss`_iob+0xa8              truss`prtim+0x20
>     1     1891 0xfffffd7fff060180           truss`main_thread+0x125
>     1     1890 libc.so.1`_uberdata+0x1c0    truss`main+0x507
>     1     1884 libc.so.1`seed_lk            truss`main+0xdb4
>     1     1877 libc.so.1`libc_malloc_lock   libproc.so.1`Pfree+0x10f
>     1     1875 libc.so.1`libc_malloc_lock   libproc.so.1`Pxcreate+0x52
>     1     1867 libc.so.1`libc_malloc_lock   truss`main+0xdb4
>     1     1832 0xfffffd7fff0600c0           libc.so.1`thr_create+0x3f
>     1     1830 0xfffffd7fff060100           truss`main_thread+0x125
>     1     1770 libc.so.1`_uberdata+0xfc0    libc.so.1`thr_create+0x3f
>     1     1765 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1     1754 truss`_iob+0x28              libc.so.1`_exithandle+0x3b
>     1     1732 libc.so.1`_uberdata+0xd40    truss`main+0xf8
>     1     1723 libc.so.1`_uberdata+0xd40    truss`_start+0x1b
>     1     1695 truss`_iob+0xa8              truss`report+0x4af
>     1     1681 libc.so.1`_uberdata+0xd40    libnsl.so.1`_libnsl_lock_init+0x86
>     1     1674 libc.so.1`_uberdata+0xe80    libc.so.1`_thrp_create+0x531
>     1     1649 0xfefa1f80                   0xfef63d56
>     1     1648 truss`_iob+0xa8              truss`report+0x492
>     1     1647 0xfef9e5d0                   0xfef14052
>     1     1642 0xfffffd7fff200080           truss`Xcritical+0x46
>     1     1640 truss`_iob+0xa8              truss`report+0x486
>     1     1637 truss`_iob+0xa8              truss`prtim+0x20
>     1     1624 0xfffffd7fff060180           libc.so.1`thr_create+0x3f
>     1     1616 0xfefa2200                   0xfef01d17
>     1     1608 libc.so.1`_uberdata          truss`create_thread+0x58
>     1     1608 truss`_iob+0xa8              truss`prtim+0x51
>     1     1547 truss`_iob+0xa8              truss`report+0x53e
>     1     1547 truss`_iob+0xa8              truss`prtim+0x51
>     1     1533 0xfefa2040                   0xfef63d56
>     1     1518 0xfefa1f80                   0xfef63d56
>     1     1505 libc.so.1`libc_malloc_lock   libproc.so.1`Pfree+0x1cf
>     1     1488 0xfefa1f80                   0xfef63d56
>     1     1466 truss`_iob+0xa8              truss`prtim+0x51
>     1     1445 libc.so.1`libc_malloc_lock   truss`free_private+0x88
>     1     1443 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>     1     1426 truss`_iob+0xa8              truss`report+0x4cb
>     1     1424 truss`_iob+0xa8              truss`prtim+0x20
>     1     1415 0xfef9e5d0                   0xfef14052
>     1     1404 libc.so.1`_uberdata+0xfc0    libc.so.1`thr_create+0x3f
>     1     1397 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>     1     1378 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>     1     1372 libc.so.1`_uberdata          libc.so.1`thr_create+0x3f
>     1     1363 libc.so.1`libc_malloc_lock   truss`free_private+0x90
>     1     1345 libc.so.1`_uberdata+0xd40    truss`_start+0x27
>     1     1343 0xfefa1f80                   0xfef63d56
>     1     1341 libc.so.1`_uberdata+0xfc0    truss`_start+0x1b
>     1     1340 0xfefa2200                   0xfef01d17
>     1     1324 0xfefa1f80                   0xfef63d56
>     1     1284 libc.so.1`_uberdata+0xfc0    libc.so.1`thr_create+0x3f
>     1     1142 libc.so.1`_uberdata+0xfc0    truss`_start+0x27
>
> Mutex block
>
> Count     nsec Lock                         Caller
> -------------------------------------------------------------------------------
> 124379    34360 truss`truss_lock             truss`worker_thread+0x27d
> 88153     2193 truss`truss_lock             truss`worker_thread+0x27d
>     3    55924 truss`truss_lock             truss`worker_thread+0x5c4
>     1    12080 libc.so.1`libc_malloc_lock   libc.so.1`free+0x18
>     1     2286 truss`truss_lock             truss`main_thread+0xed
> $
>
> On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>   
>> Could you please provide lockstat information from plockstat for each of
>> these
>>  cases.  I think I know what you are seeing, but I want data to back up the
>> theory.
>>
>>  Thanks
>>
>>  Dave Valin
>>
>>
>>  senthil ramanujam wrote:
>>
>>     
>>> Hi,
>>>
>>> I am seeing a strange performance behavior with pthread mutex
>>> operations on my Solaris (4-core) system running S10U5.
>>>
>>> The issue is that if the program is run with truss enabled, I am
>>> getting about 3 times more the performance of the run without truss
>>> enabled. Strange...right?
>>>
>>> The C program does the following:
>>>   - accept 2 parameters [number of concurrent threads(N), and run-time]
>>>   - create and run N threads simultaneously for <run-time> seconds
>>>   - each thread does the following:
>>>        - acquires the pthread mutex
>>>        - increases the value of it's _own_ long variable by 1
>>>        - releases the pthread mutex
>>>   - after run-time expires, each thread reports the value of it's long
>>>       
>> variable
>>     
>>> Note that I did not use any special flags to compile it. Reply back to
>>> this email if you like to take a look at the program source code.
>>>
>>> Let's look at the program without truss enabled.
>>>
>>> $ mutex_ops -w4 -r10
>>>   number of threads:4,   run-time (in seconds)=10
>>>   Thread    1:   total ops: 4580775,   ops/second:   458077
>>>   Thread    3:   total ops: 3798001,   ops/second:   379800
>>>   Thread    0:   total ops: 4867850,   ops/second:   486785
>>>   Thread    2:   total ops: 4925299,   ops/second:   492529
>>> $
>>> <<vmstat 1>>
>>>  kthr      memory            page            disk          faults      cpu
>>>  r b w   swap  free  re  mf pi po fr de sr s0 s1 -- --   in   sy   cs us
>>>       
>> sy id
>>     
>>>  0 0 0 17121968 14961748 22 205 0 0 0 0  0  0  0  0  0 7960 16218 16155 28
>>>       
>> 2 71
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19914 41572 41568
>>>       
>> 81 3 16
>>     
>>>  1 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 18711 39086 38937
>>>       
>> 80 3 17
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19006 39373 39382
>>>       
>> 80 3 17
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 18719 39151 39007
>>>       
>> 80 3 16
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19560 41243 40963
>>>       
>> 81 4 16
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 20498 43671 43203
>>>       
>> 81 4 16
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 20857 44636 44014
>>>       
>> 80 4 15
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19149 41239 40583
>>>       
>> 80 4 16
>>     
>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 21019 44189 43995
>>>       
>> 81 4 15
>>     
>>>  0 0 0 17122372 14962176 7 29 0 0  0  0  0  0  0  0  0 13062 27003 26973
>>>       
>> 46 2 52
>>     
>>>  0 0 0 17122372 14962176 0 0 0  0  0  0  0  0  0  0  0  461  160  274  0
>>>       
>> 0 100
>>     
>>> As you can see above that I am executing the program with 4 concurrent
>>> threads for 10 seconds. The output shows that each thread has acquired
>>> the mutex lock about 400,000 times a second. The vmstat shows that the
>>> program used about 80% of usertime, 4%system and ~15% idle. So far so
>>> good.
>>>
>>> Now, let's look into program with truss case.
>>>
>>> $ truss -fc mutex_ops -w4 -r10
>>>   number of threads:4,   run-time (in seconds)=10
>>>   Thread    0:   total ops: 13419838,   ops/second:  1341983
>>>   Thread    2:   total ops: 13959115,   ops/second:  1395911
>>>   Thread    1:   total ops: 13414291,   ops/second:  1341429
>>>   Thread    3:   total ops: 13535670,   ops/second:  1353567
>>>
>>> syscall               seconds   calls  errors
>>> _exit                    .000       1
>>> write                    .000       5
>>> open                     .000       3       1
>>> close                    .000       2
>>> stat                     .000      10       7
>>> getpid                   .000       1
>>> fstat                    .000       1
>>> ioctl                    .000       1
>>> execve                   .000       1
>>> lwp_park                 .905  106423
>>> lwp_unpark               .757  106423
>>> sigaction                .000       1
>>> getcontext               .000       1
>>> setustack                .000       5
>>> mmap                     .000      16
>>> munmap                   .000       2
>>> getrlimit                .000       1
>>> memcntl                  .000       1
>>> sysconfig                .000       2
>>> yield                    .182   26004
>>> lwp_create               .000       4
>>> lwp_exit                 .000       4
>>> lwp_continue             .000       4
>>> lwp_sigmask              .000       4
>>> lwp_private              .000       1
>>> lwp_wait                 .000       4
>>> lwp_mutex_wakeup         .000       9
>>> nanosleep                .000       1
>>> schedctl                 .000       5
>>> resolvepath              .000       4
>>> lwp_mutex_timedlock      .000      32
>>> getcwd                   .000       1
>>>                     --------  ------   ----
>>> sys totals:             1.846  238977      8
>>> usr time:              21.363
>>> elapsed:               10.060
>>> $
>>>
>>> <<vmstat 1>>
>>>  kthr      memory            page            disk          faults      cpu
>>>  r b w   swap  free  re  mf pi po fr de sr s0 s1 -- --   in   sy   cs us
>>>       
>> sy id
>>     
>>>  0 0 0 17122372 14962176 0 0 0  0  0  0  0  0  0  0  0  529  262  335  0
>>>       
>> 0 100
>>     
>>>  0 0 0 17121240 14960660 153 632 0 0 0 0 0  0  0  0  0 25259 91754
>>> 87992 37 15 48
>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 62751 178228
>>> 179102 57 33 10
>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 56922 169565
>>> 168507 59 30 11
>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 60935 178967
>>> 178535 58 32 10
>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 61581 180239
>>> 179751 58 32 10
>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 61486 181664
>>> 180513 58 33 10
>>>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 67941 186205
>>> 190703 55 35 10
>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 59981 175708
>>> 175758 58 32 10
>>>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 60485 174557
>>> 175091 58 32 11
>>>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 62799 178328
>>> 179880 57 32 10
>>>  0 0 0 17122236 14962040 8 17 0 0  0  0  0  0  0  0  0 19483 58122
>>> 58317 19 10 71
>>>  0 0 0 17122236 14962040 0 0 0  0  0  0  0  0  0  0  0  449  160  279  0
>>>       
>> 0 100
>>     
>>> Can you spot the issue? I used the same parameters(4 concurrent
>>> threads, 10 seconds run-time), but this time *with truss enabled*. The
>>> output shows that each thread has acquired the mutex lock about
>>> 1,300,000 times a second. vmstat shows that the program is using a lot
>>> more system time than before. However, I don't know how truss can
>>> cause the system time to go up, which ultimately increases the
>>> performance. Am I missing something?
>>>
>>> BTW, I confirmed the mutex locks generated by the program using
>>> dtrace. The values are indeed correct for both the cases.
>>>
>>> Any inputs/comments are appreciated.
>>>
>>> thanks.
>>>
>>> senthil
>>> _______________________________________________
>>> perf-discuss mailing list
>>> perf-discuss@opensolaris.org
>>>
>>>
>>>       
>>     

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to