Hi everyone I am using ubuntu12.04 with 64 AMD cores and Nvidia GT620 (96 cores). I am a little confused about the performance of OpenMP backend Here is some test result on my server: the test case is cube_tet24 from Jin Seok Park's post [https://groups.google.com/forum/#!searchin/pyfrmailinglist/cblas$20serial$20better$20/pyfrmailinglist/osp16U_0UCE/QCVshUCaheMJ]
1. OPENCL, 64*CPU, ela:00:38:52 2. OPENCL, 96*GPU, mem object allocation failure 3. CUDA, 96*GPU, out of memory 4. OPENMP, OMP_NUM_THREADS=32, set cblas-type=serial, rem: 02:30:00 5. OPENMP, OMP_NUM_THREADS=32, set cblas-type=parallel, rem: 14:31:07 6. MPI+OPENMP, OMP_NUM_THREADS=1, serial, 32 partitions, rem: 00:50:00 1. CUDA is not applicable because of memory limit, is it possible to circumvent this problem? I have 256 GB ram for cpu. 2. How to interpret the OPENMP results? what is the difference between parallel and serial. 3. I thought MPI is favorable on cluster rather than on a single server. Why MPI+OPENMP seems faster than using OPENMP solely? 4. Why OPENCL seems faster than other available configuration? -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at http://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
