Hello,

I am wondering if someone can provide a bit more descriptions on these 
parameters to optimize performance.

As far as I know, when using multiple GPUs, I had to select local-rank for 
device-id and cuda-aware for mpi-type. When exactly should i be using 
round-robin and local-rank? And when should i be using standard or 
cuda-aware?

How would you select GiMMiK cutoff? How does it affect accuracy / 
performance?

I believe block-1d and block-2d are determined by GPU's specification. I am 
not very familiar with Cuda. Please someone can elaborate a bit. For 
example I am running pyfr with two Tesla k80s in parallel, what's the block 
size for 1d and 2d pointswise kernels? 


Parameterises the CUDA backend with

   1. 
   
   device-id — method for selecting which device(s) to run on:
   
   *int* | round-robin | local-rank
   2. 
   
   gimmik-max-nnz — cutoff for GiMMiK in terms of the number of non-zero 
   entires in a constant matrix:
   
   *int*
   3. 
   
   mpi-type — type of MPI library that is being used:
   
   standard | cuda-aware
   4. 
   
   block-1d — block size for one dimensional pointwise kernels:
   
   *int*
   5. 
   
   block-2d — block size for two dimensional pointwise kernels:
   
   *int*, *int*
   

Thanks a lot!

Junting Chen

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/pyfrmailinglist/cbe65aa4-1765-4fcf-a6ff-a641ef378e9d%40googlegroups.com.

Reply via email to