Hello, I am wondering if someone can provide a bit more descriptions on these parameters to optimize performance.
As far as I know, when using multiple GPUs, I had to select local-rank for device-id and cuda-aware for mpi-type. When exactly should i be using round-robin and local-rank? And when should i be using standard or cuda-aware? How would you select GiMMiK cutoff? How does it affect accuracy / performance? I believe block-1d and block-2d are determined by GPU's specification. I am not very familiar with Cuda. Please someone can elaborate a bit. For example I am running pyfr with two Tesla k80s in parallel, what's the block size for 1d and 2d pointswise kernels? Parameterises the CUDA backend with 1. device-id — method for selecting which device(s) to run on: *int* | round-robin | local-rank 2. gimmik-max-nnz — cutoff for GiMMiK in terms of the number of non-zero entires in a constant matrix: *int* 3. mpi-type — type of MPI library that is being used: standard | cuda-aware 4. block-1d — block size for one dimensional pointwise kernels: *int* 5. block-2d — block size for two dimensional pointwise kernels: *int*, *int* Thanks a lot! Junting Chen -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web, visit https://groups.google.com/d/msgid/pyfrmailinglist/cbe65aa4-1765-4fcf-a6ff-a641ef378e9d%40googlegroups.com.
