Hi, On 28/06/15 15:09, CatDog wrote: > 1. CUDA is not applicable because of memory limit, is it possible to > circumvent this problem? I have 256 GB ram for cpu.
No. Generally this is not a problem in the sense that for real world simulations you'll almost always be compute -- as opposed to memory -- bound. As a point of reference if you fully load up an NVIDIA K40c (12 GiB of memory) with a simulation to get any reasonable statistics out of it you will probably need to run the simulation for three weeks or more. > 2. How to interpret the OPENMP results? what is the difference > between parallel and serial. The OpenMP results depend heavily on the configuration of your system and what BLAS library you're using. A key point is that OpenMP only performs well inside of a single NUMA zone. For instance, if you have 64 AMD cores in a single system then you probably have four sockets each with a 16 core CPU. Each of these CPUs will have two NUMA zones for a total of eight NUMA zones. Therefore, the optimal configuration is to partition the mesh into eight pieces and run each piece with four threads. Care is necessary to ensure that these threads are 'pinned' to the correct cores. Getting this right when using a combination of MPI + OpenMP on a single system can sometimes be painful. The parallel vs serial distinction depends on if the BLAS library you are using is multi-threaded or not. If it is multi-threaded then you'll want to set this to be parallel, otherwise serial. The recommendation is to use a single threaded BLAS library (ATLAS works best, followed by MKL, and then OpenBLAS) and let PyFR do the parallelism as opposed to the BLAS library itself. > 3. I thought MPI is favorable on cluster rather than on a single > server. Why MPI+OPENMP seems faster than using OPENMP solely? Practically a system with eight NUMA zones is basically eight separate systems with cache coherency. > 4. Why OPENCL seems faster than other available configuration? It is problem and system specific. In my experience when tuned correctly the OpenMP backend should be able to outperform the OpenCL backend at higher polynomial orders. However, it does require more work to configure. Regards, Freddie. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at http://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
signature.asc
Description: OpenPGP digital signature
