senthil ramanujam wrote:
> Good tips. I have been testing the performance difference of
> system-calls, library routines, ipc implementations between Solaris
> and Linux. This is one of the test codes I quickly put together to
> show the difference. I'll keep your options on the table and suggest
> soon as we get to this point.
> 
> many thanks.

Have you taken a look at libmicro? It might be just what you want!

http://opensolaris.org/os/project/libmicro/

-neel



> 
> senthil
> 
> On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> Your welcome.  If the value you are incrementing is unique to each thread
>> you
>>  can try forcing them onto their own cache line
>>
>>  int value[64];
>>  #define VALUE_1   0
>>  #define VALUE_2 16
>>  #define VALUE_3 32
>>  #define VALUE_4 48
>>
>>  Where value would be one location for the thread.  This will remove
>> potential cache thrashing going on.
>>  If  you are using one value then you could try placing them together on a
>> cache line (take the single miss for the lock and update).
>>
>>  struct data {
>>    mutex lock;
>>    int count;
>>    int pad_to_64[14];
>>  };
>>
>>  There are games that you need to play with the compiler/loader to get these
>> lined up on their own
>>  cache line, what that game is depends on the code.
>>
>>  Dave V
>>
>>
>>  senthil ramanujam wrote:
>>
>>> Hi Dave,
>>>
>>> hmmm, I was kind of expecting this answer. Thanks for your analysis.
>>>
>>> senthil
>>>
>>> On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>> Ok, this pretty much verifies what I thought.  In the short, by running
>>>> truss you have
>>>>  slowed things down enough in the application to significantly remove
>> the
>>>> lock contention
>>>>  on the program's mutex.  This is one of those performance items that
>>>> appears to be contradictory
>>>>  on the surface.
>>>>
>>>>  A bit more on this now:
>>>>
>>>>  Your program has a very tight loop, you acquire the lock, increment a
>>>> variable and drop the
>>>>  lock.  There is a whole slew of atomic_* operations that essentially
>> does
>>>> the same thing, just
>>>>  more efficient (if they can be used depends on the case).
>>>>  By being a tight loop as it is, during the operation of the program
>> without
>>>> truss  it ends up having
>>>>  the threads queuing up on each other in attempt to acquire the lock.
>> When
>>>> we miss on a lock because
>>>>  it is owned by another thread we end up going into a totally different
>> set
>>>> of lock routines which is
>>>>  expensive.  By running truss what you have done is slowed things down
>>>> enough to reduce the
>>>>  lock contention being seen on the mutex in your program and the
>> likeliness
>>>> of entering the "slow" lock
>>>>  path.  If you were to add work before or after the mutex operation you
>> will
>>>> see the same effect, and probably
>>>>  even to a greater degree.
>>>>  Hope this helps.
>>>>
>>>>
>>>>  Dave Valin
>>>>
>>>>  senthil ramanujam wrote:
>>>>
>>>>
>>>>
>>>>> Hi Dave,
>>>>>
>>>>> Here is the plockstat data for both the cases. Let me know if you like
>>>>> to see more details.
>>>>>
>>>>> Looking forward to hear your comments.
>>>>>
>>>>> thanks.
>>>>>
>>>>> senthil
>>>>>
>>>>>
>>>>> $ plockstat -A mutex_ops -w4 -r10
>>>>>  number of threads:4,   run-time (in seconds)=10
>>>>>  Thread    2:   total ops: 456161,   ops/second:    45616
>>>>>  Thread    0:   total ops: 452104,   ops/second:    45210
>>>>>  Thread    1:   total ops: 167338,   ops/second:    16733
>>>>>  Thread    3:   total ops: 174826,   ops/second:    17482
>>>>> plockstat: pid 13497 has exited
>>>>>
>>>>> Mutex hold
>>>>>
>>>>> Count     nsec Lock                         Caller
>>>>>
>>>>>
>>>>>
>> -------------------------------------------------------------------------------
>>>>
>>>>> 1248577     1639 mutex_ops`worker_mutex       mutex_ops`do_run+0x1b2
>>>>>  1851     1960 mutex_ops`worker_mutex       mutex_ops`do_run+0x1b2
>>>>>   3   104789 mutex_ops`_iob+0xa8          mutex_ops`do_run+0x25a
>>>>>   1    82008 libc.so.1`_uberdata+0xfc0    0x400e16
>>>>>   1    42599 libc.so.1`_first_link_lock   libc.so.1`_exithandle+0x3b
>>>>>   1    39967 libc.so.1`_uberdata+0x60     LM1`ld.so.1`rt_thr_init+0x29
>>>>>   4     9374 libc.so.1`_uberdata+0x1040   libc.so.1`_lwp_start
>>>>>   4     8402 libc.so.1`_uberdata+0x40
>> libc.so.1`pthread_create+0x135
>>>>>   1    32844 mutex_ops`_iob+0x28          libc.so.1`_exithandle+0x3b
>>>>>   3     9736 libc.so.1`_uberdata
>>>>>
>>>>>
>>>> libc.so.1`_thr_exit_common+0x9c
>>>>
>>>>
>>>>>   4     6343 libc.so.1`_uberdata          mutex_ops`main+0x2e7
>>>>>   1    25319 mutex_ops`_iob+0xa8          mutex_ops`do_run+0x25a
>>>>>   4     6162 libc.so.1`_uberdata          mutex_ops`main+0x22e
>>>>>   4     4369 libc.so.1`_uberdata
>>>>>
>>>>>
>>>> libc.so.1`_thr_exit_common+0x9c
>>>>
>>>>
>>>>>   2     6982 mutex_ops`worker_mutex       mutex_ops`do_run+0xea
>>>>>   6     1524 libc.so.1`_uberdata+0xd40    libc.so.1`exit+0x11
>>>>>   1     8422 libc.so.1`_uberdata
>>>>>
>>>>>
>>>> libc.so.1`_thr_exit_common+0x9c
>>>>
>>>>
>>>>>   4     1934 libc.so.1`_uberdata+0x1040   libc.so.1`_thrp_unwind+0x5e
>>>>>   1     7646 0xfffffd7fff220100
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   2     3517 mutex_ops`worker_mutex       mutex_ops`do_run+0xea
>>>>>   1     6603 libc.so.1`_uberdata+0xd40    libc.so.1`libc_init+0x2c6
>>>>>   4     1516 libc.so.1`_uberdata+0x1000   libc.so.1`_thrp_unwind+0x5e
>>>>>   1     6029 0xfffffd7fff220080
>> libc.so.1`pthread_create+0x135
>>>>>   1     5789 0xfffffd7fff220080
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   1     4387 0xfffffd7fff2200c0
>> libc.so.1`pthread_create+0x135
>>>>>   3     1447 libc.so.1`_uberdata+0xe80    libc.so.1`find_stack+0x18a
>>>>>   3     1396 libc.so.1`_uberdata          libc.so.1`_thrp_create+0x531
>>>>>   1     4078 0xfffffd7fff220100
>> libc.so.1`pthread_create+0x135
>>>>>   3     1309 libc.so.1`_uberdata
>> libc.so.1`pthread_create+0xfe
>>>>>   1     3900 0xfffffd7fff220140
>> libc.so.1`pthread_create+0x135
>>>>>   1     3811 0xfffffd7fff2200c0
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   1     3436 0xfffffd7fff220140
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   1     2827 0xfffffd7fff220080
>> libc.so.1`pthread_create+0xfe
>>>>>   2     1367 libc.so.1`_uberdata+0xe40    libc.so.1`_thrp_unwind+0x5e
>>>>>   1     2369 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>>>>>   1     2341 mutex_ops`worker_mutex       mutex_ops`do_run+0x1b2
>>>>>   1     2311 libc.so.1`_uberdata+0x1040   LM1`ld.so.1`rt_thr_init+0x29
>>>>>   1     2056 0xfffffd7fff220100           mutex_ops`main+0x2e7
>>>>>   1     1951 0xfffffd7fff2200c0           mutex_ops`main+0x2e7
>>>>>   1     1863 0xfffffd7fff220140           mutex_ops`main+0x2e7
>>>>>   1     1803 0xfffffd7fff220140
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1794 libc.so.1`_uberdata          mutex_ops`main+0x22e
>>>>>   1     1782 0xfffffd7fff220100
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1671 0xfffffd7fff2200c0
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1622 mutex_ops`_iob+0xa8          libc.so.1`_exithandle+0x3b
>>>>>   1     1510 0xfffffd7fff220080           mutex_ops`main+0x2e7
>>>>>   1     1508 libc.so.1`_uberdata+0xe80    libc.so.1`_thrp_create+0x531
>>>>>   1     1506 libc.so.1`_uberdata+0xd40    0x400dbb
>>>>>   1     1495 libc.so.1`_uberdata+0xfc0    0x400dbb
>>>>>   1     1446 libc.so.1`_uberdata+0xfc0
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1398 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>>>>>   1     1389 libc.so.1`_uberdata
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1370 libc.so.1`_uberdata+0xfc0
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1347 libc.so.1`_uberdata+0xd40    0x400dc7
>>>>>   1     1310 libc.so.1`_uberdata+0xfc0    0x400dc7
>>>>>   1     1194 libc.so.1`_uberdata+0xfc0
>> libc.so.1`pthread_create+0xfe
>>>>>   1     1177 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>>>>>
>>>>> Mutex block
>>>>>
>>>>> Count     nsec Lock                         Caller
>>>>>
>>>>>
>>>>>
>> -------------------------------------------------------------------------------
>>>>
>>>>>  866 21921703 mutex_ops`worker_mutex       mutex_ops`do_run+0x150
>>>>> 692495     8328 mutex_ops`worker_mutex       mutex_ops`do_run+0x150
>>>>>   2 11086954 mutex_ops`worker_mutex       mutex_ops`do_run+0x94
>>>>>   3   219879 mutex_ops`_iob+0xa8          libc.so.1`_flockget+0x1a
>>>>>   1    12013 libc.so.1`_uberdata          mutex_ops`main+0x2e7
>>>>>   1     2094 libc.so.1`_uberdata
>>>>>
>>>>>
>>>> libc.so.1`_thr_exit_common+0x9c
>>>>
>>>>
>>>>> << trusss + mutex >>
>>>>> $ plockstat -A truss -fc mutex_ops -w4 -r10
>>>>>  number of threads:4,   run-time (in seconds)=10
>>>>>  Thread    2:   total ops: 17918909,   ops/second:  1791890
>>>>>  Thread    0:   total ops: 18518051,   ops/second:  1851805
>>>>>  Thread    1:   total ops: 18280090,   ops/second:  1828009
>>>>>  Thread    3:   total ops: 18889775,   ops/second:  1888977
>>>>>
>>>>> syscall               seconds   calls  errors
>>>>> _exit                    .000       1
>>>>> write                    .000       5
>>>>> open                     .000       3       1
>>>>> close                    .000       2
>>>>> stat                     .000      10       7
>>>>> getpid                   .000       1
>>>>> fstat                    .000       1
>>>>> ioctl                    .000       1
>>>>> execve                   .000       1
>>>>> lwp_park                 .634   94191
>>>>> lwp_unpark               .501   94191
>>>>> sigaction                .000       1
>>>>> getcontext               .000       1
>>>>> setustack                .000       5
>>>>> mmap                     .000      16
>>>>> munmap                   .000       2
>>>>> getrlimit                .000       1
>>>>> memcntl                  .000       1
>>>>> sysconfig                .000       2
>>>>> yield                    .082   12845
>>>>> lwp_create               .000       4
>>>>> lwp_exit                 .000       4
>>>>> lwp_continue             .000       4
>>>>> lwp_sigmask              .000       4
>>>>> lwp_private              .000       1
>>>>> lwp_wait                 .000       4
>>>>> lwp_mutex_wakeup         .000      10
>>>>> nanosleep                .000       1
>>>>> schedctl                 .000       5
>>>>> resolvepath              .000       4
>>>>> lwp_mutex_timedlock      .000      68
>>>>> getcwd                   .000       1
>>>>>                    --------  ------   ----
>>>>> sys totals:             1.219  201391      8
>>>>> usr time:              17.809
>>>>> elapsed:               10.030
>>>>> plockstat: pid 13499 has exited
>>>>>
>>>>> Mutex hold
>>>>>
>>>>> Count     nsec Lock                         Caller
>>>>>
>>>>>
>>>>>
>> -------------------------------------------------------------------------------
>>>>
>>>>> 257123     8166 truss`truss_lock             truss`worker_thread+0x3a7
>>>>> 145651    11250 truss`truss_lock             truss`worker_thread+0x3a7
>>>>> 201384     1893 truss`count_lock             truss`sysexit+0x82d
>>>>>   1  2468522 libc.so.1`_uberdata+0x60     libproc.so.1`Pxcreate+0x63
>>>>>   1  2447046 libnsl.so.1`_ti_userlock
>>>>> libnsl.so.1`_libnsl_parent_atfork+0x10
>>>>>   1  2380540 libc.so.1`_uberdata+0xfc0    libc.so.1`fork+0x130
>>>>>   1  2364179 libc.so.1`_first_link_lock   libc.so.1`fork+0x130
>>>>>   1  2356270 libc.so.1`libc_malloc_lock   libc.so.1`fork+0x130
>>>>>   1  2337553 libc.so.1`_uberdata+0x40     libproc.so.1`Pxcreate+0x63
>>>>>   4    52074 0x455478                     libproc.so.1`Lgrab+0x234
>>>>>   1   144394 libc.so.1`_uberdata+0xfc0    truss`main_thread+0x1e8
>>>>>   4    32702 truss`truss_lock             truss`worker_thread+0x5d6
>>>>>   5    23212 0x455478                     libproc.so.1`Lfree+0x3a
>>>>>   1    66469 0xfef9e1b0                   0xfef06df3
>>>>>   1    64668 0xfefa12c0                   0xfef67a91
>>>>>  32     1858 truss`_iob+0xa8              truss`report+0x223
>>>>>   1    58652 0x455478                     libproc.so.1`Lgrab+0x234
>>>>>   5    11034 libc.so.1`_uberdata
>>>>>
>>>>>
>>>> libc.so.1`_thr_exit_common+0x9c
>>>>
>>>>
>>>>>  32     1583 truss`_iob+0xa8              truss`prtim+0x51
>>>>>  32     1547 truss`_iob+0xa8              truss`report+0x25e
>>>>>   2    23108 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>  32     1412 truss`_iob+0xa8              truss`prtim+0x64
>>>>>   1    40708 0xfefa12a0                   0xfef021df
>>>>>   1    40112 libc.so.1`_uberdata+0x60     LM1`ld.so.1`rt_thr_init+0x29
>>>>>   1    34837 libc.so.1`_first_link_lock   libc.so.1`_exithandle+0x3b
>>>>>   5     6891 truss`truss_lock             truss`main_thread+0x116
>>>>>   1    34270 0xfef9e5d0                   0xfef14052
>>>>>   4     8244 libc.so.1`_uberdata+0x40     truss`worker_thread+0xe52
>>>>>   1    31319 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   5     6235 libc.so.1`_uberdata          truss`main_thread+0x125
>>>>>   4     7557 libc.so.1`libc_malloc_lock   libproc.so.1`Lgrab+0xd3
>>>>>   5     6010 libc.so.1`_uberdata+0x1040   libc.so.1`_lwp_start
>>>>>   1    29687 libc.so.1`libc_malloc_lock   truss`main+0x4ac
>>>>>   1    28272 libc.so.1`mktemp_lock.0      truss`main+0xdb4
>>>>>   1    26436 truss`truss_lock             truss`worker_thread+0x5d6
>>>>>   1    25322 truss`_iob+0xa8              libc.so.1`_exithandle+0x3b
>>>>>   1    25197 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   4     6142 libc.so.1`_uberdata          truss`worker_thread+0xe34
>>>>>   1    21847 libc.so.1`libc_malloc_lock   truss`my_calloc+0x18
>>>>>   1    19740 libproc.so.1`minfd_lock.0    libproc.so.1`set_minfd+0x46
>>>>>   1    17725 0xfffffd7fff060140
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   1    14834 libc.so.1`libc_malloc_lock   truss`free_private+0x44
>>>>>   5     2865 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1    12900 0xfffffd7fff200080           truss`Xcritical+0x46
>>>>>   5     2470 libc.so.1`_uberdata
>>>>>
>>>>>
>>>> libc.so.1`_thr_exit_common+0x9c
>>>>
>>>>
>>>>>   1    12239 libc.so.1`__sbrk_lock
>>>>>
>>>>>
>>>> libc.so.1`_malloc_unlocked+0x10d
>>>>
>>>>
>>>>>   1    12155 0xfef9e1f0                   0xfef0703b
>>>>>   1    12062 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   6     2004 truss`count_lock             truss`worker_thread+0x108a
>>>>>   1    11918 libc.so.1`_uberdata+0x40     truss`main_thread+0x1ed
>>>>>   5     2347 truss`count_lock             truss`worker_thread+0xb09
>>>>>   1    11608 0xfef9f708                   0xfef26ac9
>>>>>   5     2215 libc.so.1`libc_malloc_lock
>>>>>
>>>>>
>>>> libproc.so.1`Lfree_internal+0x67
>>>>
>>>>
>>>>>   1    10685 libc.so.1`update_lock
>> libctf.so.1`_libctf_init+0x10
>>>>>   1    10092 libc.so.1`_uberdata          libproc.so.1`Pxcreate+0x63
>>>>>   5     2016 libc.so.1`_uberdata+0x1040   libc.so.1`_thrp_unwind+0x5e
>>>>>   1     9656 0x455478                     libproc.so.1`Pfree+0x117
>>>>>   2     4810 libc.so.1`__sbrk_lock        libc.so.1`realloc+0x235
>>>>>   5     1871 0xfffffd7fff200080           truss`Xcritical+0x46
>>>>>   6     1543 libc.so.1`_uberdata+0xd40    libc.so.1`exit+0x11
>>>>>   5     1744 libc.so.1`libc_malloc_lock   truss`free_private+0x55
>>>>>   5     1738 libc.so.1`_uberdata+0x1000   libc.so.1`_thrp_unwind+0x5e
>>>>>   5     1688 libc.so.1`libc_malloc_lock   truss`free_private+0x66
>>>>>   1     8407 truss`count_lock             truss`setup_basetime+0x60
>>>>>   1     8252 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   2     4044 libc.so.1`__sbrk_lock        libc.so.1`malloc+0x2d
>>>>>   1     8072 libc.so.1`libc_malloc_lock   libproc.so.1`Lgrab+0x72
>>>>>   5     1612 libc.so.1`_uberdata+0x1000   libc.so.1`_thrp_unwind+0x5e
>>>>>   1     8046 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   5     1543 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   2     3818 libc.so.1`__sbrk_lock        libc.so.1`realloc+0x235
>>>>>   4     1897 libc.so.1`_uberdata          libc.so.1`thr_create+0x3f
>>>>>   1     7516 0xfef9e5d0                   0xfef14052
>>>>>   5     1497 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1     7375 0xfffffd7fff060080           truss`main_thread+0x1ed
>>>>>   1     7372 libc.so.1`_uberdata          truss`create_thread+0x58
>>>>>   1     6977 libc.so.1`_uberdata+0xd40    libc.so.1`libc_init+0x2c6
>>>>>   1     6664 0xfefa1f80                   0xfef63d56
>>>>>   4     1663 libc.so.1`_uberdata+0xe80    libc.so.1`find_stack+0x18a
>>>>>   1     6569 0xfffffd7fff060180
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   2     3186 libc.so.1`__sbrk_lock        libc.so.1`malloc+0x2d
>>>>>   1     6260 libc.so.1`_uberdata+0x1000   truss`main+0xf8
>>>>>   1     5847 libc.so.1`libc_malloc_lock   libproc.so.1`Lgrab+0xd3
>>>>>   1     5799 libc.so.1`_uberdata+0x60
>>>>>
>>>>>
>>>> libnsl.so.1`_libnsl_lock_init+0x86
>>>>
>>>>
>>>>>   1     5664 0xfffffd7fff060140           truss`worker_thread+0xe52
>>>>>   1     5502 0xfefa2240                   0xfef734b2
>>>>>   4     1338 libc.so.1`_uberdata          libc.so.1`_thrp_create+0x531
>>>>>   1     5248 0xfffffd7fff0600c0           truss`worker_thread+0xe52
>>>>>   1     5204 libc.so.1`__sbrk_lock
>>>>>
>>>>>
>>>> libc.so.1`_malloc_unlocked+0x10d
>>>>
>>>>
>>>>>   1     5141 truss`count_lock             truss`per_proc_init+0xe1
>>>>>   1     5105 0xfefa10e0                   0xfef62237
>>>>>   1     4904 0xfefa10e0                   0xfef62237
>>>>>   3     1613 libc.so.1`libc_malloc_lock   truss`free_private+0x44
>>>>>   1     4403 libc.so.1`_uberdata+0x140    truss`main+0xbeb
>>>>>   1     4245 0xfffffd7fff060100           truss`worker_thread+0xe52
>>>>>   1     3702 0xfffffd7fff0600c0
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   1     3469 truss`_iob+0xa8              truss`report+0x418
>>>>>   2     1702 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1     3396 0xfffffd7fff060100
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   2     1641 libc.so.1`_uberdata+0xe40    libc.so.1`_thrp_unwind+0x5e
>>>>>   2     1514 truss`_iob+0xa8              truss`report+0x366
>>>>>   2     1477 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   2     1369 libc.so.1`libc_malloc_lock   truss`sysentry+0x4e7
>>>>>   1     2653 0xfffffd7fff060080           libc.so.1`thr_create+0x3f
>>>>>   1     2646 libc.so.1`_uberdata+0x140    truss`main+0xbff
>>>>>   1     2528 truss`truss_lock             truss`main_thread+0x116
>>>>>   1     2470 libc.so.1`_uberdata+0x580    truss`main+0x554
>>>>>   1     2452 0xfffffd7fff200080           truss`Xcritical+0x46
>>>>>   1     2391 0xfffffd7fff060080
>> libc.so.1`_thr_exit_common+0x9c
>>>>>   1     2223 libc.so.1`_uberdata+0x440    truss`main+0x545
>>>>>   1     2219 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1     2203 libc.so.1`_uberdata+0x4c0    truss`main+0x518
>>>>>   1     2187 libc.so.1`_uberdata+0x500    truss`main+0x527
>>>>>   1     2185 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>>>>>   1     2182 0xfefa2280                   0xfef73271
>>>>>   1     2127 0xfffffd7fff060080           truss`main_thread+0x125
>>>>>   1     2113 0xfffffd7fff060180           truss`worker_thread+0xe52
>>>>>   1     2090 libc.so.1`_uberdata+0x1040   LM1`ld.so.1`rt_thr_init+0x29
>>>>>   1     2082 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1     2072 libc.so.1`_uberdata+0x540    truss`main+0x536
>>>>>   1     2057 libc.so.1`_uberdata+0x180    truss`main+0xc0e
>>>>>   1     2041 libc.so.1`_uberdata+0x1c0    truss`main+0xc31
>>>>>   1     2035 0xfffffd7fff060140           truss`main_thread+0x125
>>>>>   1     2013 0xfffffd7fff060140           libc.so.1`thr_create+0x3f
>>>>>   1     1968 libc.so.1`libc_malloc_lock   truss`main+0xdf1
>>>>>   1     1961 0xfefa2000                   0xfef63d56
>>>>>   1     1954 libc.so.1`_uberdata          libproc.so.1`Pxcreate+0x63
>>>>>   1     1927 0xfffffd7fff0600c0           truss`main_thread+0x125
>>>>>   1     1923 libc.so.1`_uberdata+0x180    truss`main+0xc22
>>>>>   1     1918 0xfffffd7fff060100           libc.so.1`thr_create+0x3f
>>>>>   1     1917 truss`_iob+0xa8              truss`prtim+0x20
>>>>>   1     1891 0xfffffd7fff060180           truss`main_thread+0x125
>>>>>   1     1890 libc.so.1`_uberdata+0x1c0    truss`main+0x507
>>>>>   1     1884 libc.so.1`seed_lk            truss`main+0xdb4
>>>>>   1     1877 libc.so.1`libc_malloc_lock   libproc.so.1`Pfree+0x10f
>>>>>   1     1875 libc.so.1`libc_malloc_lock   libproc.so.1`Pxcreate+0x52
>>>>>   1     1867 libc.so.1`libc_malloc_lock   truss`main+0xdb4
>>>>>   1     1832 0xfffffd7fff0600c0           libc.so.1`thr_create+0x3f
>>>>>   1     1830 0xfffffd7fff060100           truss`main_thread+0x125
>>>>>   1     1770 libc.so.1`_uberdata+0xfc0    libc.so.1`thr_create+0x3f
>>>>>   1     1765 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1     1754 truss`_iob+0x28              libc.so.1`_exithandle+0x3b
>>>>>   1     1732 libc.so.1`_uberdata+0xd40    truss`main+0xf8
>>>>>   1     1723 libc.so.1`_uberdata+0xd40    truss`_start+0x1b
>>>>>   1     1695 truss`_iob+0xa8              truss`report+0x4af
>>>>>   1     1681 libc.so.1`_uberdata+0xd40
>>>>>
>>>>>
>>>> libnsl.so.1`_libnsl_lock_init+0x86
>>>>
>>>>
>>>>>   1     1674 libc.so.1`_uberdata+0xe80    libc.so.1`_thrp_create+0x531
>>>>>   1     1649 0xfefa1f80                   0xfef63d56
>>>>>   1     1648 truss`_iob+0xa8              truss`report+0x492
>>>>>   1     1647 0xfef9e5d0                   0xfef14052
>>>>>   1     1642 0xfffffd7fff200080           truss`Xcritical+0x46
>>>>>   1     1640 truss`_iob+0xa8              truss`report+0x486
>>>>>   1     1637 truss`_iob+0xa8              truss`prtim+0x20
>>>>>   1     1624 0xfffffd7fff060180           libc.so.1`thr_create+0x3f
>>>>>   1     1616 0xfefa2200                   0xfef01d17
>>>>>   1     1608 libc.so.1`_uberdata          truss`create_thread+0x58
>>>>>   1     1608 truss`_iob+0xa8              truss`prtim+0x51
>>>>>   1     1547 truss`_iob+0xa8              truss`report+0x53e
>>>>>   1     1547 truss`_iob+0xa8              truss`prtim+0x51
>>>>>   1     1533 0xfefa2040                   0xfef63d56
>>>>>   1     1518 0xfefa1f80                   0xfef63d56
>>>>>   1     1505 libc.so.1`libc_malloc_lock   libproc.so.1`Pfree+0x1cf
>>>>>   1     1488 0xfefa1f80                   0xfef63d56
>>>>>   1     1466 truss`_iob+0xa8              truss`prtim+0x51
>>>>>   1     1445 libc.so.1`libc_malloc_lock   truss`free_private+0x88
>>>>>   1     1443 libc.so.1`libc_malloc_lock   truss`my_realloc+0x18
>>>>>   1     1426 truss`_iob+0xa8              truss`report+0x4cb
>>>>>   1     1424 truss`_iob+0xa8              truss`prtim+0x20
>>>>>   1     1415 0xfef9e5d0                   0xfef14052
>>>>>   1     1404 libc.so.1`_uberdata+0xfc0    libc.so.1`thr_create+0x3f
>>>>>   1     1397 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>>>>>   1     1378 libc.so.1`_uberdata+0xd40    libc.so.1`_thrp_create+0x370
>>>>>   1     1372 libc.so.1`_uberdata          libc.so.1`thr_create+0x3f
>>>>>   1     1363 libc.so.1`libc_malloc_lock   truss`free_private+0x90
>>>>>   1     1345 libc.so.1`_uberdata+0xd40    truss`_start+0x27
>>>>>   1     1343 0xfefa1f80                   0xfef63d56
>>>>>   1     1341 libc.so.1`_uberdata+0xfc0    truss`_start+0x1b
>>>>>   1     1340 0xfefa2200                   0xfef01d17
>>>>>   1     1324 0xfefa1f80                   0xfef63d56
>>>>>   1     1284 libc.so.1`_uberdata+0xfc0    libc.so.1`thr_create+0x3f
>>>>>   1     1142 libc.so.1`_uberdata+0xfc0    truss`_start+0x27
>>>>>
>>>>> Mutex block
>>>>>
>>>>> Count     nsec Lock                         Caller
>>>>>
>>>>>
>>>>>
>> -------------------------------------------------------------------------------
>>>>
>>>>> 124379    34360 truss`truss_lock             truss`worker_thread+0x27d
>>>>> 88153     2193 truss`truss_lock             truss`worker_thread+0x27d
>>>>>   3    55924 truss`truss_lock             truss`worker_thread+0x5c4
>>>>>   1    12080 libc.so.1`libc_malloc_lock   libc.so.1`free+0x18
>>>>>   1     2286 truss`truss_lock             truss`main_thread+0xed
>>>>> $
>>>>>
>>>>> On 7/1/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Could you please provide lockstat information from plockstat for
>> each of
>>>>>> these
>>>>>>  cases.  I think I know what you are seeing, but I want data to back
>> up
>>>>>>
>>>> the
>>>>
>>>>
>>>>>> theory.
>>>>>>
>>>>>>  Thanks
>>>>>>
>>>>>>  Dave Valin
>>>>>>
>>>>>>
>>>>>>  senthil ramanujam wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am seeing a strange performance behavior with pthread mutex
>>>>>>> operations on my Solaris (4-core) system running S10U5.
>>>>>>>
>>>>>>> The issue is that if the program is run with truss enabled, I am
>>>>>>> getting about 3 times more the performance of the run without
>> truss
>>>>>>> enabled. Strange...right?
>>>>>>>
>>>>>>> The C program does the following:
>>>>>>>  - accept 2 parameters [number of concurrent threads(N), and
>> run-time]
>>>>>>>  - create and run N threads simultaneously for <run-time> seconds
>>>>>>>  - each thread does the following:
>>>>>>>      - acquires the pthread mutex
>>>>>>>      - increases the value of it's _own_ long variable by 1
>>>>>>>      - releases the pthread mutex
>>>>>>>  - after run-time expires, each thread reports the value of it's
>> long
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> variable
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Note that I did not use any special flags to compile it. Reply
>> back to
>>>>>>> this email if you like to take a look at the program source code.
>>>>>>>
>>>>>>> Let's look at the program without truss enabled.
>>>>>>>
>>>>>>> $ mutex_ops -w4 -r10
>>>>>>>  number of threads:4,   run-time (in seconds)=10
>>>>>>>  Thread    1:   total ops: 4580775,   ops/second:   458077
>>>>>>>  Thread    3:   total ops: 3798001,   ops/second:   379800
>>>>>>>  Thread    0:   total ops: 4867850,   ops/second:   486785
>>>>>>>  Thread    2:   total ops: 4925299,   ops/second:   492529
>>>>>>> $
>>>>>>> <<vmstat 1>>
>>>>>>>  kthr      memory            page            disk          faults
>>>>>>>
>>>>>>>
>>>> cpu
>>>>
>>>>
>>>>>>>  r b w   swap  free  re  mf pi po fr de sr s0 s1 -- --   in   sy
>> cs
>>>>>>>
>>>> us
>>>>
>>>>
>>>>>>>
>>>>>> sy id
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 22 205 0 0 0 0  0  0  0  0  0 7960 16218
>>>>>>>
>>>>>>>
>>>> 16155 28
>>>>
>>>>
>>>>>>>
>>>>>> 2 71
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19914 41572
>>>>>>>
>>>>>>>
>>>> 41568
>>>>
>>>>
>>>>>>>
>>>>>> 81 3 16
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  1 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 18711 39086
>>>>>>>
>>>>>>>
>>>> 38937
>>>>
>>>>
>>>>>>>
>>>>>> 80 3 17
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19006 39373
>>>>>>>
>>>>>>>
>>>> 39382
>>>>
>>>>
>>>>>>>
>>>>>> 80 3 17
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 18719 39151
>>>>>>>
>>>>>>>
>>>> 39007
>>>>
>>>>
>>>>>>>
>>>>>> 80 3 16
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19560 41243
>>>>>>>
>>>>>>>
>>>> 40963
>>>>
>>>>
>>>>>>>
>>>>>> 81 4 16
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 20498 43671
>>>>>>>
>>>>>>>
>>>> 43203
>>>>
>>>>
>>>>>>>
>>>>>> 81 4 16
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 20857 44636
>>>>>>>
>>>>>>>
>>>> 44014
>>>>
>>>>
>>>>>>>
>>>>>> 80 4 15
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19149 41239
>>>>>>>
>>>>>>>
>>>> 40583
>>>>
>>>>
>>>>>>>
>>>>>> 80 4 16
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 21019 44189
>>>>>>>
>>>>>>>
>>>> 43995
>>>>
>>>>
>>>>>>>
>>>>>> 81 4 15
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17122372 14962176 7 29 0 0  0  0  0  0  0  0  0 13062 27003
>>>>>>>
>>>>>>>
>>>> 26973
>>>>
>>>>
>>>>>>>
>>>>>> 46 2 52
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17122372 14962176 0 0 0  0  0  0  0  0  0  0  0  461  160
>> 274
>>>>>>>
>>>> 0
>>>>
>>>>
>>>>>>>
>>>>>> 0 100
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> As you can see above that I am executing the program with 4
>> concurrent
>>>>>>> threads for 10 seconds. The output shows that each thread has
>> acquired
>>>>>>> the mutex lock about 400,000 times a second. The vmstat shows that
>> the
>>>>>>> program used about 80% of usertime, 4%system and ~15% idle. So far
>> so
>>>>>>> good.
>>>>>>>
>>>>>>> Now, let's look into program with truss case.
>>>>>>>
>>>>>>> $ truss -fc mutex_ops -w4 -r10
>>>>>>>  number of threads:4,   run-time (in seconds)=10
>>>>>>>  Thread    0:   total ops: 13419838,   ops/second:  1341983
>>>>>>>  Thread    2:   total ops: 13959115,   ops/second:  1395911
>>>>>>>  Thread    1:   total ops: 13414291,   ops/second:  1341429
>>>>>>>  Thread    3:   total ops: 13535670,   ops/second:  1353567
>>>>>>>
>>>>>>> syscall               seconds   calls  errors
>>>>>>> _exit                    .000       1
>>>>>>> write                    .000       5
>>>>>>> open                     .000       3       1
>>>>>>> close                    .000       2
>>>>>>> stat                     .000      10       7
>>>>>>> getpid                   .000       1
>>>>>>> fstat                    .000       1
>>>>>>> ioctl                    .000       1
>>>>>>> execve                   .000       1
>>>>>>> lwp_park                 .905  106423
>>>>>>> lwp_unpark               .757  106423
>>>>>>> sigaction                .000       1
>>>>>>> getcontext               .000       1
>>>>>>> setustack                .000       5
>>>>>>> mmap                     .000      16
>>>>>>> munmap                   .000       2
>>>>>>> getrlimit                .000       1
>>>>>>> memcntl                  .000       1
>>>>>>> sysconfig                .000       2
>>>>>>> yield                    .182   26004
>>>>>>> lwp_create               .000       4
>>>>>>> lwp_exit                 .000       4
>>>>>>> lwp_continue             .000       4
>>>>>>> lwp_sigmask              .000       4
>>>>>>> lwp_private              .000       1
>>>>>>> lwp_wait                 .000       4
>>>>>>> lwp_mutex_wakeup         .000       9
>>>>>>> nanosleep                .000       1
>>>>>>> schedctl                 .000       5
>>>>>>> resolvepath              .000       4
>>>>>>> lwp_mutex_timedlock      .000      32
>>>>>>> getcwd                   .000       1
>>>>>>>                   --------  ------   ----
>>>>>>> sys totals:             1.846  238977      8
>>>>>>> usr time:              21.363
>>>>>>> elapsed:               10.060
>>>>>>> $
>>>>>>>
>>>>>>> <<vmstat 1>>
>>>>>>>  kthr      memory            page            disk          faults
>>>>>>>
>>>>>>>
>>>> cpu
>>>>
>>>>
>>>>>>>  r b w   swap  free  re  mf pi po fr de sr s0 s1 -- --   in   sy
>> cs
>>>>>>>
>>>> us
>>>>
>>>>
>>>>>>>
>>>>>> sy id
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17122372 14962176 0 0 0  0  0  0  0  0  0  0  0  529  262
>> 335
>>>>>>>
>>>> 0
>>>>
>>>>
>>>>>>>
>>>>>> 0 100
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  0 0 0 17121240 14960660 153 632 0 0 0 0 0  0  0  0  0 25259 91754
>>>>>>> 87992 37 15 48
>>>>>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 62751
>> 178228
>>>>>>> 179102 57 33 10
>>>>>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 56922
>> 169565
>>>>>>> 168507 59 30 11
>>>>>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 60935
>> 178967
>>>>>>> 178535 58 32 10
>>>>>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 61581
>> 180239
>>>>>>> 179751 58 32 10
>>>>>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 61486
>> 181664
>>>>>>> 180513 58 33 10
>>>>>>>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 67941
>> 186205
>>>>>>> 190703 55 35 10
>>>>>>>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 59981
>> 175708
>>>>>>> 175758 58 32 10
>>>>>>>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 60485
>> 174557
>>>>>>> 175091 58 32 11
>>>>>>>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 62799
>> 178328
>>>>>>> 179880 57 32 10
>>>>>>>  0 0 0 17122236 14962040 8 17 0 0  0  0  0  0  0  0  0 19483 58122
>>>>>>> 58317 19 10 71
>>>>>>>  0 0 0 17122236 14962040 0 0 0  0  0  0  0  0  0  0  0  449  160
>> 279
>>>>>>>
>>>> 0
>>>>
>>>>
>>>>>>>
>>>>>> 0 100
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Can you spot the issue? I used the same parameters(4 concurrent
>>>>>>> threads, 10 seconds run-time), but this time *with truss enabled*.
>> The
>>>>>>> output shows that each thread has acquired the mutex lock about
>>>>>>> 1,300,000 times a second. vmstat shows that the program is using a
>> lot
>>>>>>> more system time than before. However, I don't know how truss can
>>>>>>> cause the system time to go up, which ultimately increases the
>>>>>>> performance. Am I missing something?
>>>>>>>
>>>>>>> BTW, I confirmed the mutex locks generated by the program using
>>>>>>> dtrace. The values are indeed correct for both the cases.
>>>>>>>
>>>>>>> Any inputs/comments are appreciated.
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>> senthil
>>>>>>> _______________________________________________
>>>>>>> perf-discuss mailing list
>>>>>>> perf-discuss@opensolaris.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org


-- 
Neelakanth Nadgir      http://blogs.sun.com/realneel
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to