2011/6/6 D?rrwang, J?rgen <Juergen.Duerrwang at iosb.fraunhofer.de> > 1.Load Matrix which should be solve to CPU and GPU > > 2.Decompose in blocks, so on each block an ILU(0) can run in > ?parallel?. : CPU > > 3.Loop until tolerance is reached > > 4.Solve each block in parallel to get an preconditioner > : CPU > > 5.Solve CG with preconditioner to break down iteration number > :GPU >
Step 5 is not all on the GPU. You do a matrix multiple and a dot product on the GPU, then move the vector over to the CPU, put the pieces on different cores, solve, put it back on the GPU. > 6.End loop > > > > There are about 4 copies between CPU /GPU per step, but that isn?t a > problem > You have a copy each way *per CG iteration*. I think it is a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110606/347b8a42/attachment.htm>
