Could you please provide lockstat information from plockstat for each of 
these
cases.  I think I know what you are seeing, but I want data to back up 
the theory.

Thanks

Dave Valin


senthil ramanujam wrote:
> Hi,
>
> I am seeing a strange performance behavior with pthread mutex
> operations on my Solaris (4-core) system running S10U5.
>
> The issue is that if the program is run with truss enabled, I am
> getting about 3 times more the performance of the run without truss
> enabled. Strange...right?
>
> The C program does the following:
>    - accept 2 parameters [number of concurrent threads(N), and run-time]
>    - create and run N threads simultaneously for <run-time> seconds
>    - each thread does the following:
>         - acquires the pthread mutex
>         - increases the value of it's _own_ long variable by 1
>         - releases the pthread mutex
>    - after run-time expires, each thread reports the value of it's long 
> variable
>
> Note that I did not use any special flags to compile it. Reply back to
> this email if you like to take a look at the program source code.
>
> Let's look at the program without truss enabled.
>
> $ mutex_ops -w4 -r10
>    number of threads:4,   run-time (in seconds)=10
>    Thread    1:   total ops: 4580775,   ops/second:   458077
>    Thread    3:   total ops: 3798001,   ops/second:   379800
>    Thread    0:   total ops: 4867850,   ops/second:   486785
>    Thread    2:   total ops: 4925299,   ops/second:   492529
> $
> <<vmstat 1>>
>  kthr      memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr s0 s1 -- --   in   sy   cs us sy id
>  0 0 0 17121968 14961748 22 205 0 0 0 0  0  0  0  0  0 7960 16218 16155 28 2 
> 71
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19914 41572 41568 81 3 
> 16
>  1 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 18711 39086 38937 80 3 
> 17
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19006 39373 39382 80 3 
> 17
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 18719 39151 39007 80 3 
> 16
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19560 41243 40963 81 4 
> 16
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 20498 43671 43203 81 4 
> 16
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 20857 44636 44014 80 4 
> 15
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 19149 41239 40583 80 4 
> 16
>  0 0 0 17121968 14961748 0 0 0  0  0  0  0  0  0  0  0 21019 44189 43995 81 4 
> 15
>  0 0 0 17122372 14962176 7 29 0 0  0  0  0  0  0  0  0 13062 27003 26973 46 2 
> 52
>  0 0 0 17122372 14962176 0 0 0  0  0  0  0  0  0  0  0  461  160  274  0  0 
> 100
>
> As you can see above that I am executing the program with 4 concurrent
> threads for 10 seconds. The output shows that each thread has acquired
> the mutex lock about 400,000 times a second. The vmstat shows that the
> program used about 80% of usertime, 4%system and ~15% idle. So far so
> good.
>
> Now, let's look into program with truss case.
>
> $ truss -fc mutex_ops -w4 -r10
>    number of threads:4,   run-time (in seconds)=10
>    Thread    0:   total ops: 13419838,   ops/second:  1341983
>    Thread    2:   total ops: 13959115,   ops/second:  1395911
>    Thread    1:   total ops: 13414291,   ops/second:  1341429
>    Thread    3:   total ops: 13535670,   ops/second:  1353567
>
> syscall               seconds   calls  errors
> _exit                    .000       1
> write                    .000       5
> open                     .000       3       1
> close                    .000       2
> stat                     .000      10       7
> getpid                   .000       1
> fstat                    .000       1
> ioctl                    .000       1
> execve                   .000       1
> lwp_park                 .905  106423
> lwp_unpark               .757  106423
> sigaction                .000       1
> getcontext               .000       1
> setustack                .000       5
> mmap                     .000      16
> munmap                   .000       2
> getrlimit                .000       1
> memcntl                  .000       1
> sysconfig                .000       2
> yield                    .182   26004
> lwp_create               .000       4
> lwp_exit                 .000       4
> lwp_continue             .000       4
> lwp_sigmask              .000       4
> lwp_private              .000       1
> lwp_wait                 .000       4
> lwp_mutex_wakeup         .000       9
> nanosleep                .000       1
> schedctl                 .000       5
> resolvepath              .000       4
> lwp_mutex_timedlock      .000      32
> getcwd                   .000       1
>                      --------  ------   ----
> sys totals:             1.846  238977      8
> usr time:              21.363
> elapsed:               10.060
> $
>
> <<vmstat 1>>
>  kthr      memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr s0 s1 -- --   in   sy   cs us sy id
>  0 0 0 17122372 14962176 0 0 0  0  0  0  0  0  0  0  0  529  262  335  0  0 
> 100
>  0 0 0 17121240 14960660 153 632 0 0 0 0 0  0  0  0  0 25259 91754
> 87992 37 15 48
>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 62751 178228
> 179102 57 33 10
>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 56922 169565
> 168507 59 30 11
>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 60935 178967
> 178535 58 32 10
>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 61581 180239
> 179751 58 32 10
>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 61486 181664
> 180513 58 33 10
>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 67941 186205
> 190703 55 35 10
>  0 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 59981 175708
> 175758 58 32 10
>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 60485 174557
> 175091 58 32 11
>  1 0 0 17121104 14960524 0 0 0  0  0  0  0  0  0  0  0 62799 178328
> 179880 57 32 10
>  0 0 0 17122236 14962040 8 17 0 0  0  0  0  0  0  0  0 19483 58122
> 58317 19 10 71
>  0 0 0 17122236 14962040 0 0 0  0  0  0  0  0  0  0  0  449  160  279  0  0 
> 100
>
> Can you spot the issue? I used the same parameters(4 concurrent
> threads, 10 seconds run-time), but this time *with truss enabled*. The
> output shows that each thread has acquired the mutex lock about
> 1,300,000 times a second. vmstat shows that the program is using a lot
> more system time than before. However, I don't know how truss can
> cause the system time to go up, which ultimately increases the
> performance. Am I missing something?
>
> BTW, I confirmed the mutex locks generated by the program using
> dtrace. The values are indeed correct for both the cases.
>
> Any inputs/comments are appreciated.
>
> thanks.
>
> senthil
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
>   

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to