Hi,

I am running a multithreaded application with 20 threads on my 24-core 
AMD Opteron (ccNUMA) machine running Solaris 10. When I run the application 
with 
threads binding to cores using pbind (one-thread to one-core), its performance 
is 
dramatically degrading. It is around 80% performance loss with binding. To 
understand 
this, I used "prstat -m", and found that without binding (the default case), 
the % lock-contention (LCK field) is around 13%, but with binding it is around 
30%. 
Moreover, the % latency (LAT field) is almost zero but with binding it is 
around 37. 
Please find LCK and LAT fields of prstat output below.

Configuration   USR    LCK  LAT
--------------------------------
No-Binding      86     13   0.1
Binding         32     30   37 


Therefore, the application with binding spends most of the time in contention 
or in the ready-queue.  BTW, there is no significant difference in cache 
miss-ratio 
measured with cpustat(1). 

Is it because of the following reasons? If not, please let me know how to find 
the 
reasons behind the above behavior.

Since the application has serious inter-thread communication, some threads 
need to wait for locks, therefore the binding configuration increases memory 
traffic among the chips. Moreover, because of the memory latency, the delay 
loop 
time (the delay loop before retrying a lock) will be incremented exponentially 
and therefore threads spend most of the time waiting for locks. 

However, in the default configuration (no-binding), the load is balance well 
by migrating threads among the cores, and therefore threads get a chance to 
share the lock data structures and thus improves performance compared with 
binding configuration. 

Please find the "prstat -Lm" output per thread in both the configurations below:

No-Binding (Default) Configuration
==================================
 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 15637 user   93 0.2 0.0 0.0 0.0 6.6 0.0 0.1 186  34 437   0 myprogram/13
 15637 user   92 0.2 0.0 0.0 0.0 8.0 0.0 0.1 176  36 399   0 myprogram/11
 15637 user   91 0.2 0.0 0.0 0.0 8.8 0.0 0.1 201  34 398   0 myprogram/10
 15637 user   89 0.2 0.0 0.0 0.0  11 0.0 0.2 253  34 450   0 myprogram/12
 15637 user   87 0.2 0.0 0.0 0.0  13 0.0 0.1 194  34 414   0 myprogram/17
 15637 user   87 0.2 0.0 0.0 0.0  13 0.0 0.1 187  34 416   0 myprogram/9
 15637 user   86 0.2 0.0 0.0 0.0  13 0.0 0.1 188  34 420   0 myprogram/21
 15637 user   86 0.1 0.0 0.0 0.0  14 0.0 0.1 227  45 454   0 myprogram/3
 15637 user   86 0.2 0.0 0.0 0.0  14 0.0 0.1 215  37 443   0 myprogram/15
 15637 user   86 0.2 0.0 0.0 0.0  14 0.0 0.1 212  35 435   0 myprogram/7
 15637 user   85 0.2 0.0 0.0 0.0  14 0.0 0.3 258  43 520   0 myprogram/2
 15637 user   85 0.2 0.0 0.0 0.0  15 0.0 0.1 213  34 454   0 myprogram/5
 15637 user   85 0.2 0.0 0.0 0.0  15 0.0 0.1 216  80 438   0 myprogram/19
 15637 user   85 0.2 0.0 0.0 0.0  15 0.0 0.1 248  36 464   0 myprogram/6
 15637 user   84 0.2 0.0 0.0 0.0  15 0.0 0.1 257  35 474   0 myprogram/14
 15637 user   84 0.2 0.0 0.0 0.0  16 0.0 0.1 241  31 445   0 myprogram/18
 15637 user   83 0.2 0.0 0.0 0.0  17 0.0 0.2 256  30 467   0 myprogram/16
 15637 user   83 0.2 0.0 0.0 0.0  17 0.0 0.2 265  30 476   0 myprogram/8
 15637 user   83 0.2 0.0 0.0 0.0  17 0.0 0.2 257  31 467   0 myprogram/20
 15637 user   81 0.2 0.0 0.0 0.0  18 0.0 0.2 259  30 488   0 myprogram/4
 15637 user  0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 myprogram/1


Binding (thread-to-core) Configuration
=======================================
  PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 15687 user  6.1 0.0 0.0 0.0 0.0  41 0.0  53  33   8  54   0 myprogram/13
 15687 user  5.7 0.0 0.0 0.0 0.0  32 0.0  62  31  10  38   0 myprogram/11
 15687 user  5.5 0.0 0.0 0.0 0.0  37 0.0  57  26  15  35   0 myprogram/10
 15687 user  5.4 0.0 0.0 0.0 0.0  47 0.0  47  34   6  78   0 myprogram/21
 15687 user  5.4 0.0 0.0 0.0 0.0  35 0.0  60  28  16  43   0 myprogram/17
 15687 user  5.2 0.0 0.0 0.0 0.0  42 0.0  53  33   6  59   0 myprogram/6
 15687 user  5.2 0.0 0.0 0.0 0.0  36 0.0  59  31   8  36   0 myprogram/15
 15687 user  5.2 0.0 0.0 0.0 0.0  56 0.0  39  36   7  72   0 myprogram/2
 15687 user  5.1 0.0 0.0 0.0 0.0  51 0.0  44  34   6  62   0 myprogram/5
 15687 user  5.0 0.0 0.0 0.0 0.0  50 0.0  45  33   6  54   0 myprogram/16
 15687 user  5.0 0.0 0.0 0.0 0.0  39 0.0  56  31   8  43   0 myprogram/7
 15687 user  4.9 0.0 0.0 0.0 0.0  38 0.0  57  33   7  41   0 myprogram/19
 15687 user  4.8 0.0 0.0 0.0 0.0  32 0.0  63  29  11  47   0 myprogram/12
 15687 user  4.7 0.0 0.0 0.0 0.0  43 0.0  53  31   8  36   0 myprogram/14
 15687 user  4.6 0.0 0.0 0.0 0.0  36 0.0  59  32   8  46   0 myprogram/8
 15687 user  4.5 0.0 0.0 0.0 0.0  51 0.0  45  33   5  63   0 myprogram/20
 15687 user  4.5 0.0 0.0 0.0 0.0  57 0.0  38  32   6  60   0 myprogram/18
 15687 user  4.4 0.0 0.0 0.0 0.0  59 0.0  37  31   7  66   0 myprogram/9
 15687 user  4.3 0.0 0.0 0.0 0.0  43 0.0  53  30   6  41   0 myprogram/3
 15687 user  4.3 0.0 0.0 0.0 0.0  43 0.0  53  33   5  57   0 myprogram/4
 15687 user  0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 myprogram/1
-- 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to