Re: [petsc-dev] GPU preconditioners

Karl Rupp Fri, 17 Jan 2014 13:36:50 -0800

Hi Andrea,

> Well, I have 9 equations, so 9x9 I guess...

Ok, this is just in the range where it is technically meaningful(register sizes) but getting challenging implementation-wise (explicitinversion formulas vs. Gauss with pivoting)

I hope the one you are mentioning was a major bug, because what I get is
seriously wrong: while on single GPU (KSPGMRES+PCASM) I get a residual
of +0.72, on 8-cores/GPU I get -1.00 at the first time step, just to
make an example. Can this be due to the bug you are saying or you can
suspect something more?

Yes, this was a major bug, breaking the matrix-vector product when usingmultiple MPI ranks with GPUs.

What should I do then? wait for the valgrind fix which is underway and
then update? Can you please notify me when this is fixed? I'm writing a
final report for a project and I would like to include this feature
fully fixed if possible.

I will merge the fix to master tomorrow when I'm back on my main GPUmachine (there do not seem to be any problems in 'next' with the patch)and fix the valgrind complaints separately. The second issue is notdirectly related to the first, it only happens in the same module.

Another question, what do you exactly mean by "order the unknowns
properly" in this case?

If you build the elimination graph for the triangular factors of ILUpreconditioners, then the ordering of the unknowns (i.e. the way youassign the degrees of freedoms (DOFs) on your mesh) can have aconsiderable influence on the amount of parallelism. The Cuthill-McKeealgorithm for example is quite good for reducing the bandwidth of asparse matrix, but it may also reduce the amount of parallelism for ILU0factors compared to e.g. a red-black ordering of the DOFs. I can sendyou a preprint if you're interested.


Best regards,
Karli

Re: [petsc-dev] GPU preconditioners

Reply via email to