[pyfrmailinglist] Some confusions about OPENMP backend

CatDog Sun, 28 Jun 2015 07:11:02 -0700

Hi everyone

I am using ubuntu12.04 with 64 AMD cores and Nvidia GT620 (96 cores). I am 
a little confused about the performance of OpenMP backend
Here is some test result on my server: the test case is cube_tet24 from Jin 
Seok Park's post 
[https://groups.google.com/forum/#!searchin/pyfrmailinglist/cblas$20serial$20better$20/pyfrmailinglist/osp16U_0UCE/QCVshUCaheMJ]


1. OPENCL, 64*CPU, ela:00:38:52
2. OPENCL, 96*GPU, mem object allocation failure
3. CUDA, 96*GPU, out of memory
4. OPENMP, OMP_NUM_THREADS=32, set cblas-type=serial, rem: 02:30:00
5. OPENMP, OMP_NUM_THREADS=32, set cblas-type=parallel, rem: 14:31:07
6. MPI+OPENMP, OMP_NUM_THREADS=1, serial, 32 partitions, rem: 00:50:00  

1. CUDA is not applicable because of memory limit, is it possible to 
circumvent this problem? I have 256 GB ram for cpu.
2. How to interpret the OPENMP results? what is the difference between 
parallel and serial.
3. I thought MPI is favorable on cluster rather than on a single server. 
Why MPI+OPENMP seems faster than using OPENMP solely?
4. Why OPENCL seems faster than other available configuration?

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at http://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

[pyfrmailinglist] Some confusions about OPENMP backend

Reply via email to