Re: [petsc-dev] Improving and stabilizing GPU support

Karl Rupp Fri, 19 Jul 2013 15:22:37 -0700

Hi Paul,

>>> * Reduce CUSP dependency: The current elementary operations are

mainly realized via CUSP. With better support via CUSPARSE and
CUBLAS, I'd add a separate 'native' CUDA backend so that we can
provide a full set of vector and sparse matrix operations out of the
default NVIDIA toolchain. We will still keep CUSP for its
preconditioners, yet we no longer depend on it.

Agreed. In the past, I've suggested a -vec_type cuda (not cusp). All the
CUSP operations can be done with Thrust algorithms. Since Thrust comes
default with CUDA, one can have only a CUDA dependency.


Yes, I opt for
 -vec_type cuda

if everything needed is shipped with the CUDA toolkit. I even tend toavoid Thrust as much as possible and go with CUBLAS/CUSPARSE because weget faster compilation and less compiler warnings this way, but that'san implementation detail :-)

* Integrate last bits of txpetscgpu package. I assume Paul will
provide a helping hand here.

Of course. This will go much faster as much of the hard work is done. Do
people want support for different matrix formats in the CUSP classes :
i.e. diagonal, ellpack, hybrid? I think the CUSP preconditioners can be
derived from matrices stored in non-csr format (although they're likely
just doing a convert under the hood).

Since people keep asking for fast SpMV, we should provide these otherformats as well (actually, they are partially provided with your updateto the CUSPARSE bindings already). The main reason for CUSP is the SApreconditioner, for which SpMV performance doesn't really matter.

* Documentation: Add a chapter on GPUs to the manual, particularly on
what to expect and what not to expect. Update documentation on
webpage regarding installation.

I will help with the manual.


Cheers :-)

* Integration of FEM quadrature from SNES ex52. The CUDA part
requiring code generation is not very elegant, while the OpenCL
approach is better suited for a library integration thanks to JIT.
However, this requires user code to be provided as a string (again
not very elegant) or loaded from file (more reasonable). How much FEM
functionality do we want to provide via PETSc?

Multi-GPU is a highly pressing need, IMO. Need to figure out how to make
Block Jacobi and ASM run efficiently.

The tricky part here is to balance processes vs. threads vs. GPUs. If weuse more than one GPU per process, we will duplicate more and more ofthe current MPI logic over time just to move data between GPUs. However,if we just use one GPU per process, we will under-utilize the CPU unlesswe have a good interaction with threadcomm.


Best regards,
Karli

Re: [petsc-dev] Improving and stabilizing GPU support

Reply via email to