2011/11/29 Matthew Knepley <knepley at gmail.com> > On Tue, Nov 29, 2011 at 2:38 AM, Fredrik Heffer Valdmanis < > fredva at ifi.uio.no> wrote: > >> 2011/10/28 Matthew Knepley <knepley at gmail.com> >> >>> On Fri, Oct 28, 2011 at 10:24 AM, Fredrik Heffer Valdmanis < >>> fredva at ifi.uio.no> wrote: >>> >>>> Hi, >>>> >>>> I am working on integrating the new GPU based vectors and matrices into >>>> FEniCS. Now, I'm looking at the possibility for getting some speedup during >>>> finite element assembly, specifically when inserting the local element >>>> matrix into the global element matrix. In that regard, I have a few >>>> questions I hope you can help me out with: >>>> >>>> - When calling MatSetValues with a MATSEQAIJCUSP matrix as parameter, >>>> what exactly is it that happens? As far as I can see, MatSetValues is not >>>> implemented for GPU based matrices, neither is the mat->ops->setvalues set >>>> to point at any function for this Mat type. >>>> >>> >>> Yes, MatSetValues always operates on the CPU side. It would not make >>> sense to do individual operations on the GPU. >>> >>> I have written batched of assembly for element matrices that are all the >>> same size: >>> >>> >>> http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBatch.html >>> >>> >>>> - Is it such that matrices are assembled in their entirety on the CPU, >>>> and then copied over to the GPU (after calling MatAssemblyBegin)? Or are >>>> values copied over to the GPU each time you call MatSetValues? >>>> >>> >>> That function assembles the matrix on the GPU and then copies to the >>> CPU. The only time you do not want this copy is when >>> you are running in serial and never touch the matrix afterwards, so I >>> left it in. >>> >>> >>>> - Can we expect to see any speedup from using MatSetValuesBatch over >>>> MatSetValues, or is the batch version simply a utility function? This >>>> question goes for both CPU- and GPU-based matrices. >>>> >>> >>> CPU: no >>> >>> GPU: yes, I see about the memory bandwidth ratio >>> >>> >>> Hi, >> >> I have now integrated MatSetValuesBatch in our existing PETSc wrapper >> layer. I have tested matrix assembly with Poisson's equation on different >> meshes with elements of varying order. I have timed the single call to >> MatSetValuesBatch and compared that to the total time consumed by the >> repeated calls to MatSetValues in the old implementation. I have the >> following results: >> >> Poisson on 1000x1000 unit square, 1st order Lagrange elements: >> MatSetValuesBatch: 0.88576 s >> repeated calls to MatSetValues: 0.76654 s >> >> Poisson on 500x500 unit square, 2nd order Lagrange elements: >> MatSetValuesBatch: 0.9324 s >> repeated calls to MatSetValues: 0.81644 s >> >> Poisson on 300x300 unit square, 3rd order Lagrange elements: >> MatSetValuesBatch: 0.93988 s >> repeated calls to MatSetValues: 1.03884 s >> >> As you can see, the two methods take almost the same amount of time. >> What behavior and performance should we expect? Is there any way to >> optimize the performance of batched assembly? >> > > Almost certainly it is not dispatching to the CUDA version. The regular > version just calls MatSetValues() in a loop. Are you > using a SEQAIJCUSP matrix? > Yes. The same matrices yields a speedup of 4-6x when solving the system on the GPU.
> > >> I also have a problem with Thrust throwing std::bad_alloc on some calls >> to MatSetValuesBatch. The exception originates in thrust::device_ptr<void> >> thrust::detail::device::cuda::malloc<0u>(unsigned long). It seems to be >> thrown when the number of double values I send to MatSetValuesBatch >> approaches 30 million. I am testing this on a laptop with 4 GB RAM and a >> GeForce 540 M (1 GB memory), so the 30 million doubles are far away from >> exhausting my memory, both on the host and device side. Any clues on what >> causes this problem and how to avoid it? >> > > It uses more memory that just the values. I would have to look at the > specific case, but > I assume that the memory is exhausted. > OK, I can look further into it myself as well. Thanks, Fredrik -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111129/53054b44/attachment.htm>
