@Jed Brown

I copy the hole matrix and the solved vectors from each 
ILU-Block(=preconditioner) to GPU where I can solve with cg.

At the moment I have finished an cg solver on GPU using an algorithm from Saad. 
It is  very fast. By a matrix of size 640000x640000 and about 4.500.000 non 
zero elements I need for a failure tolerance of 10e-3 only 900ms. But I want to 
have a mix of an stabile and fast solver, so I implemented a cg solver with 
ILU(0) preconditioning. Where the ILU is unfortunately  a serial CPU 
implementation(ILU decompose and solve on CPU, cg operations on GPU). It 
computes for the same Matrix size the solution in 2,6s. So I thought if I can 
use all of my cpu cores instead of only one would be nice.  And perhaps I can 
get the the 1,5s for computing.

That?s the way I want to go:

1.Load Matrix which should be solve to CPU and GPU
2.Decompose in blocks, so on each block an ILU(0) can run in ?parallel?.        
      : CPU
3.Loop until tolerance is reached
4.Solve each block in parallel to get an preconditioner                         
                        : CPU
5.Solve CG with preconditioner to break down iteration number                   
         :GPU
6.End loop

There are about 4 copies between CPU /GPU per step, but that isn?t a problem

I haven?t seen the PETSC GPU manually until now?.


Yes, I tried some PETSC examples and I modified one for my stuff. It works very 
well on my Xeon quadcore, but my intention is to mix CPU and GPU code. I want a 
paralell domain decomposition using jacobi block method for runing ILU(0) on 
each block(number of blocks = number of CPU cores). Then I want to take the 
results of each blocksolution as a preconditioner for a cg solver on GPU.

What is the GPU going to do while this is taking place on the CPU? I don't see 
much point doing CG on the GPU if you don't also move the matrix and 
preconditioner there. (The performance may even be worse than doing everything 
on the CPU.)

Have you read the docs on running PETSc on GPUs?

http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#gpus
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-dev/docs/installation.html#CUDA

There is no ILU on the GPU because nobody has written it (because it seems to 
be ill-suited to the execution model).


At the moment I can decompose my matrix in  four jacobi block matrices. I 
compared my results with petsc and they are the same. But now I don?t know if I 
have to run my cg solver on each block or could I put the results of each 
blocked-ILU together and the use this as preconditioner for the non blocked 
matrix(my large input matrix).

You can do either of these; -pc_type asm -sub_ksp_type cg -sub_pc_type icc, for 
example. Be careful about symmetry and remember to use FGMRES if you make the 
preconditioner nonlinear.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110606/c2099876/attachment.htm>

Reply via email to