Hi Andrea,

> Well, I have 9 equations, so 9x9 I guess...

Ok, this is just in the range where it is technically meaningful (register sizes) but getting challenging implementation-wise (explicit inversion formulas vs. Gauss with pivoting)


I hope the one you are mentioning was a major bug, because what I get is
seriously wrong: while on single GPU (KSPGMRES+PCASM) I get a residual
of +0.72, on 8-cores/GPU I get -1.00 at the first time step, just to
make an example. Can this be due to the bug you are saying or you can
suspect something more?

Yes, this was a major bug, breaking the matrix-vector product when using multiple MPI ranks with GPUs.


What should I do then? wait for the valgrind fix which is underway and
then update? Can you please notify me when this is fixed? I'm writing a
final report for a project and I would like to include this feature
fully fixed if possible.

I will merge the fix to master tomorrow when I'm back on my main GPU machine (there do not seem to be any problems in 'next' with the patch) and fix the valgrind complaints separately. The second issue is not directly related to the first, it only happens in the same module.

Another question, what do you exactly mean by "order the unknowns
properly" in this case?

If you build the elimination graph for the triangular factors of ILU preconditioners, then the ordering of the unknowns (i.e. the way you assign the degrees of freedoms (DOFs) on your mesh) can have a considerable influence on the amount of parallelism. The Cuthill-McKee algorithm for example is quite good for reducing the bandwidth of a sparse matrix, but it may also reduce the amount of parallelism for ILU0 factors compared to e.g. a red-black ordering of the DOFs. I can send you a preprint if you're interested.

Best regards,
Karli

Reply via email to