Hi, Freddie

I still have some problem on the backends.

1. According to the output of clinfo (in the attachment), I have two OpenCL 
platform, in the NV GT 620 GPU platform. clinfo.out indicates I have 96 
cores and ~256MB RAM. But in the output of nvidia-smi (in the following 
lines), I have 1024MB RAM. Is it a feature or a bug? The same phenomena 
occurs in CPU platform, clinfo.out indicates I have 64 cores and ~63GB RAM, 
but actually I have 256GB RAM. 

---output of nvidia-smi---
Mon Jun 29 12:45:56 2015       
+------------------------------------------------------+                   
    
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                   
    
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. 
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute 
M. |
|===============================+======================+======================|
|   0  GeForce GT 620      Off  | 0000:03:00.0     N/A |                 
 N/A |
| 63%   36C   P12    N/A /  N/A |     36MiB /  1023MiB |     N/A     
 Default |
+-------------------------------+----------------------+----------------------+
                                                                            
   
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU 
Memory |
|  GPU       PID  Type  Process name                               Usage   
   |
|=============================================================================|
|    0              C   Not Supported                                       
  |
+-----------------------------------------------------------------------------+


2. according to the output of lscpu:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             4
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Stepping:              2
CPU MHz:               1400.000
BogoMIPS:              5199.33
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31
NUMA node4 CPU(s):     32-39
NUMA node5 CPU(s):     40-47
NUMA node6 CPU(s):     48-55
NUMA node7 CPU(s):     56-63

I have 8 NUMA, so it is better to partition the mesh into 8 parts and set 
every pyfr run with 8 threads. I tried to test in this condition:

$mpirun -n 8 pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
  14.0% [====>                           ] 0.01/0.10 ela: 00:05:05 rem: 
00:31:14

However, I use the following command to check the affinity:
$ps -A -T -o cpuid,tid,pid,ppid,cmd|grep pyfr|sort -k3n
   38 30785 30785 28363 mpirun -n 8 pyfr run -b OPENMP -p cube_tet24.pyfrm 
config.ini
    1 30786 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   17 31296 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   19 31298 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   31 31300 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   56 31302 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   57 31297 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   59 31301 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   60 31299 30786 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   10 30787 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   11 31251 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   15 31253 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   43 31250 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   58 31252 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   62 31247 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    8 31249 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    9 31248 30787 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   24 31286 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   25 31282 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   26 31283 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   28 30788 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   29 31285 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   39 31288 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   61 31284 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    7 31287 30788 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   32 31291 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   33 30789 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   34 31295 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   35 31294 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   36 31292 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   37 31289 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   38 31293 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   47 31290 30789 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    0 31261 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    0 31265 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   14 31263 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   27 31267 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    3 31266 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    4 30790 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   46 31264 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    5 31262 30790 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   48 31276 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   49 31277 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   50 31273 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   51 31279 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   52 30791 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   53 31278 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   54 31274 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   55 31275 30791 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   16 30792 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    2 31270 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   23 31272 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   41 31281 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   42 31268 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   44 31269 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    6 31271 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   63 31280 30792 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   18 30793 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   18 31255 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   20 31256 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   20 31259 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   21 31258 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   22 31254 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   23 31257 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    3 31260 30793 30785 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   26 30819 30819 30791 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   20 30820 30820 30792 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   42 30821 30821 30793 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   26 30822 30822 30786 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   32 30823 30823 30788 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
    0 30824 30824 30789 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   48 30825 30825 30790 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   56 30826 30826 30787 /home/catdog/PyFR/venv/bin/python 
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
   12 31456 31456 30103 grep --color=auto pyfr

The result indicates the program is not so clever to use OpenMP on each 
NUMA. right? How to improve this? The opencl related env. vars. related to 
this problem is GOMP_CPU_AFFINITY. Is it possible to use it to bind openmp 
on one NUMA?


On Sunday, June 28, 2015 at 10:26:50 PM UTC+8, Freddie Witherden wrote:
>
> Hi, 
>
> On 28/06/15 15:09, CatDog wrote: 
> > 1. CUDA is not applicable because of memory limit, is it possible to 
> > circumvent this problem? I have 256 GB ram for cpu. 
>
> No.  Generally this is not a problem in the sense that for real world 
> simulations you'll almost always be compute -- as opposed to memory -- 
> bound.  As a point of reference if you fully load up an NVIDIA K40c (12 
> GiB of memory) with a simulation to get any reasonable statistics out of 
> it you will probably need to run the simulation for three weeks or more. 
>
> > 2. How to interpret the OPENMP results? what is the difference 
> > between parallel and serial. 
>
> The OpenMP results depend heavily on the configuration of your system 
> and what BLAS library you're using.  A key point is that OpenMP only 
> performs well inside of a single NUMA zone. 
>
> For instance, if you have 64 AMD cores in a single system then you 
> probably have four sockets each with a 16 core CPU.  Each of these CPUs 
> will have two NUMA zones for a total of eight NUMA zones.  Therefore, 
> the optimal configuration is to partition the mesh into eight pieces and 
> run each piece with four threads.  Care is necessary to ensure that 
> these threads are 'pinned' to the correct cores.  Getting this right 
> when using a combination of MPI + OpenMP on a single system can 
> sometimes be painful. 
>
> The parallel vs serial distinction depends on if the BLAS library you 
> are using is multi-threaded or not.  If it is multi-threaded then you'll 
> want to set this to be parallel, otherwise serial.  The recommendation 
> is to use a single threaded BLAS library (ATLAS works best, followed by 
> MKL, and then OpenBLAS) and let PyFR do the parallelism as opposed to 
> the BLAS library itself. 
>
>
> > 3. I thought MPI is favorable on cluster rather than on a single 
> > server. Why MPI+OPENMP seems faster than using OPENMP solely? 
>
> Practically a system with eight NUMA zones is basically eight separate 
> systems with cache coherency. 
>
> > 4. Why OPENCL seems faster than other available configuration? 
>
> It is problem and system specific.  In my experience when tuned 
> correctly the OpenMP backend should be able to outperform the OpenCL 
> backend at higher polynomial orders.  However, it does require more work 
> to configure. 
>
> Regards, Freddie. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at http://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

Attachment: clinfo.out
Description: Binary data

Reply via email to