Hi Junting, On 06/01/2020 12:34, Junting Chen wrote: > As far as I know, when using multiple GPUs, I had to select local-rank > for device-id and cuda-aware for mpi-type. When exactly should i be > using round-robin and local-rank? And when should i be using standard or > cuda-aware?
If the GPUs in your system are in compute exclusive mode then round-robin is probably what you want. Otherwise, opt for local-rank. So long as each rank gets its own GPU there should be no impact on performance. In terms of the mpi-type this depends heavily on the hardware you're running on and the MPI library you're using. If your MPI library is CUDA aware then setting mpi-type = cuda-aware can improve performance. > How would you select GiMMiK cutoff? How does it affect accuracy / > performance? Some experimentation is needed here as the optimal value depends on the element types you're using, if anti-aliasing is enabled, and the CPU that you are running on. > I believe block-1d and block-2d are determined by GPU's specification. I > am not very familiar with Cuda. Please someone can elaborate a bit. For > example I am running pyfr with two Tesla k80s in parallel, what's the > block size for 1d and 2d pointswise kernels? You should seldom need to modify either of these two values. On some pathological meshes reducing block-1d can improve performance, but not by a lot. Regards, Freddie. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web, visit https://groups.google.com/d/msgid/pyfrmailinglist/b52438d2-301f-1a19-fb95-98a464037914%40witherden.org.
signature.asc
Description: OpenPGP digital signature
