2011/6/6 D?rrwang, J?rgen <Juergen.Duerrwang at iosb.fraunhofer.de>

> 1.Load Matrix which should be solve to CPU and GPU
>
> 2.Decompose in blocks, so on each block an ILU(0) can run in
> ?parallel?.              : CPU
>
> 3.Loop until tolerance is reached
>
> 4.Solve each block in parallel to get an preconditioner
>                                                : CPU
>
> 5.Solve CG with preconditioner to break down iteration number
>                 :GPU
>

Step 5 is not all on the GPU. You do a matrix multiple and a dot product on
the GPU, then move the vector over to the CPU, put the pieces on different
cores, solve, put it back on the GPU.


> 6.End loop
>
>
>
> There are about 4 copies between CPU /GPU per step, but that isn?t a
> problem
>

You have a copy each way *per CG iteration*. I think it is a problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110606/347b8a42/attachment.htm>

Reply via email to