Hi, Freddie I still have some problem on the backends.
1. According to the output of clinfo (in the attachment), I have two OpenCL
platform, in the NV GT 620 GPU platform. clinfo.out indicates I have 96
cores and ~256MB RAM. But in the output of nvidia-smi (in the following
lines), I have 1024MB RAM. Is it a feature or a bug? The same phenomena
occurs in CPU platform, clinfo.out indicates I have 64 cores and ~63GB RAM,
but actually I have 256GB RAM.
---output of nvidia-smi---
Mon Jun 29 12:45:56 2015
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
|===============================+======================+======================|
| 0 GeForce GT 620 Off | 0000:03:00.0 N/A |
N/A |
| 63% 36C P12 N/A / N/A | 36MiB / 1023MiB | N/A
Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
|
|=============================================================================|
| 0 C Not Supported
|
+-----------------------------------------------------------------------------+
2. according to the output of lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 21
Model: 1
Stepping: 2
CPU MHz: 1400.000
BogoMIPS: 5199.33
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31
NUMA node4 CPU(s): 32-39
NUMA node5 CPU(s): 40-47
NUMA node6 CPU(s): 48-55
NUMA node7 CPU(s): 56-63
I have 8 NUMA, so it is better to partition the mesh into 8 parts and set
every pyfr run with 8 threads. I tried to test in this condition:
$mpirun -n 8 pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
14.0% [====> ] 0.01/0.10 ela: 00:05:05 rem:
00:31:14
However, I use the following command to check the affinity:
$ps -A -T -o cpuid,tid,pid,ppid,cmd|grep pyfr|sort -k3n
38 30785 30785 28363 mpirun -n 8 pyfr run -b OPENMP -p cube_tet24.pyfrm
config.ini
1 30786 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
17 31296 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
19 31298 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
31 31300 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
56 31302 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
57 31297 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
59 31301 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
60 31299 30786 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
10 30787 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
11 31251 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
15 31253 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
43 31250 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
58 31252 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
62 31247 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
8 31249 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
9 31248 30787 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
24 31286 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
25 31282 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
26 31283 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
28 30788 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
29 31285 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
39 31288 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
61 31284 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
7 31287 30788 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
32 31291 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
33 30789 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
34 31295 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
35 31294 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
36 31292 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
37 31289 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
38 31293 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
47 31290 30789 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
0 31261 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
0 31265 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
14 31263 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
27 31267 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
3 31266 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
4 30790 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
46 31264 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
5 31262 30790 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
48 31276 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
49 31277 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
50 31273 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
51 31279 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
52 30791 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
53 31278 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
54 31274 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
55 31275 30791 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
16 30792 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
2 31270 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
23 31272 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
41 31281 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
42 31268 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
44 31269 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
6 31271 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
63 31280 30792 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
18 30793 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
18 31255 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
20 31256 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
20 31259 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
21 31258 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
22 31254 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
23 31257 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
3 31260 30793 30785 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
26 30819 30819 30791 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
20 30820 30820 30792 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
42 30821 30821 30793 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
26 30822 30822 30786 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
32 30823 30823 30788 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
0 30824 30824 30789 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
48 30825 30825 30790 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
56 30826 30826 30787 /home/catdog/PyFR/venv/bin/python
/home/catdog/PyFR/venv/bin/pyfr run -b OPENMP -p cube_tet24.pyfrm config.ini
12 31456 31456 30103 grep --color=auto pyfr
The result indicates the program is not so clever to use OpenMP on each
NUMA. right? How to improve this? The opencl related env. vars. related to
this problem is GOMP_CPU_AFFINITY. Is it possible to use it to bind openmp
on one NUMA?
On Sunday, June 28, 2015 at 10:26:50 PM UTC+8, Freddie Witherden wrote:
>
> Hi,
>
> On 28/06/15 15:09, CatDog wrote:
> > 1. CUDA is not applicable because of memory limit, is it possible to
> > circumvent this problem? I have 256 GB ram for cpu.
>
> No. Generally this is not a problem in the sense that for real world
> simulations you'll almost always be compute -- as opposed to memory --
> bound. As a point of reference if you fully load up an NVIDIA K40c (12
> GiB of memory) with a simulation to get any reasonable statistics out of
> it you will probably need to run the simulation for three weeks or more.
>
> > 2. How to interpret the OPENMP results? what is the difference
> > between parallel and serial.
>
> The OpenMP results depend heavily on the configuration of your system
> and what BLAS library you're using. A key point is that OpenMP only
> performs well inside of a single NUMA zone.
>
> For instance, if you have 64 AMD cores in a single system then you
> probably have four sockets each with a 16 core CPU. Each of these CPUs
> will have two NUMA zones for a total of eight NUMA zones. Therefore,
> the optimal configuration is to partition the mesh into eight pieces and
> run each piece with four threads. Care is necessary to ensure that
> these threads are 'pinned' to the correct cores. Getting this right
> when using a combination of MPI + OpenMP on a single system can
> sometimes be painful.
>
> The parallel vs serial distinction depends on if the BLAS library you
> are using is multi-threaded or not. If it is multi-threaded then you'll
> want to set this to be parallel, otherwise serial. The recommendation
> is to use a single threaded BLAS library (ATLAS works best, followed by
> MKL, and then OpenBLAS) and let PyFR do the parallelism as opposed to
> the BLAS library itself.
>
>
> > 3. I thought MPI is favorable on cluster rather than on a single
> > server. Why MPI+OPENMP seems faster than using OPENMP solely?
>
> Practically a system with eight NUMA zones is basically eight separate
> systems with cache coherency.
>
> > 4. Why OPENCL seems faster than other available configuration?
>
> It is problem and system specific. In my experience when tuned
> correctly the OpenMP backend should be able to outperform the OpenCL
> backend at higher polynomial orders. However, it does require more work
> to configure.
>
> Regards, Freddie.
>
>
--
You received this message because you are subscribed to the Google Groups "PyFR
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at http://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.
clinfo.out
Description: Binary data
