Re: [petsc-dev] GPU preconditioners

Karl Rupp Fri, 17 Jan 2014 13:02:40 -0800

Hi Andrea,

In fact, I have another major problem: when running on multi-GPU with
PETSc my results are totally inconsistent compared to a single GPU  .

This was a bug which was fixed a couple of days ago. It is in branch'next', but not yet merged to master since it has another valgrind issueI haven't nailed down yet.

In my code, for now, I'm assuming a 1-1 correspondence between CPU and
GPU: I run on 8 cores and 8 GPUs (4 K10).  How can I enforce this in the
PETSc solver? Is it automatically done or do I have to specify some options?

One MPI rank maps to one logical GPU. In your case, please run with 8MPI ranks and distribute them equally over the nodes equipped with the GPUs.

As for the preconditioners: We haven't added any new preconditionersrecently. Preconditioning on GPUs is a very problem-specific thing dueto the burden of PCI-Express latency. Massively parallel approaches suchas Sparse Approximate Inverses perform well in terms of theoretical FLOPcounts, but are poor in terms of convergence and pretty expensive interms of memory when running many simultaneous factorizations. ILU onthe GPU can be fast if you order the unknowns properly and have only fewnonzeros per row, but it is not great in terms of convergence rateeither. PCI-Express bandwidth and latency is really a problem here...

How large are your blocks when using a block-Jacobi preconditioner foryour problem? In the order of 3x3 or (much) larger?


Best regards,
Karli

Re: [petsc-dev] GPU preconditioners

Reply via email to