FYI, I've built Barry's updates on SUMMIT and tested it on SUMMITDEV. I can't run on SUMMIT now. It has been merged in the master branch.
THis is how you run the cuda tests (in the PETSc root directory): make -f gmakefile.test test globsearch="snes*tutorials*ex19*cuda*" Mark On Thu, Sep 20, 2018 at 5:06 PM Smith, Barry F. <[email protected]> wrote: > > Brian, > > I have finished making the (relatively few) changes needed to get > PETSc's GAMG to run on a combination of the CPU and GPU. Any of the AMG > kernels that has a CUDA backed is run automatically on the GPU while the > kernels without a CUDA backend are run on the CPU. In particular the > "solve" portion" (Chebyshev/Jacobi smoothing, coarse grid restriction and > interpolation) will run on the GPU as well as part of the AMG "setup". > > This is in the branch barry/mpiaijcusparse-better-subclass-mpiaij > which will hopefully be in the master branch tomorrow if it passes all the > test suite tonight. I see Mark is already attempting to build PETSc on > summit and can hopefully quickly determine if the branch works (Mark since > Summit is presumably a batch system you will need to run the last two test > cases listed in src/snes/examples/tutorials/ex19.c by setting up the > approbate batch file and including the appropriate PETSc command line > options.) > > We look forward to hearing how it functions and in particular would > love to receive -log_view performance output on summit comparing the use of > the GPU with simply running on the CPU for your application. This would > also tell us what additional kernels, if any, should be ported to a CUDA > backend. > > > Barry > > > > > > On Sep 19, 2018, at 4:43 PM, Mills, Richard Tran <[email protected]> > wrote: > > > > Hi Brian, > > > > Your message to petsc-dev has prompted some ongoing discussion among the > core PETSc developers, and we'll hopefully be able to give you an outline > of a coherent plan to help you meet your ECP milestones soon. > > > > We have had adding GPU support within PETSc's GAMG preconditioner on our > list of goals for some time, but we didn't manage to get this into the > recent 3.10 release. We can bump up the priority of this, and, as Jed has > said, we should be able to provide AMG setup on the CPU and the solves on > the CPU in relatively short order, and we can see how much this can help in > the near term. Doing the setup on the GPU is much more involved, but is > something that we are interested in doing. > > > > Just wanted to let you know that your query has not gone unnoticed. > Expect a more detailed reply from us soon. > > > > Best regards, > > Richard > > > > On Wed, Sep 19, 2018 at 11:43 AM Jed Brown <[email protected]> wrote: > > Brian, how frequently do you need to update the matrix (thus rebuild the > > preconditioner)? > > > > If it is infrequent, we could (in the near term) provide AMG setup on > > CPU with solves on GPU. > > > > What is your typical problem size per node to be run on Summit? What is > > your MPI/OpenMP(?) decomposition? > > > > Are these heterogeneous Poisson solves or are the equations to be solved > > implicitly more complicated? Do you have experimental information about > > relative convergence rates/grid complexity/strong scalability for your > > operator solved using classical AMG (e.g., Hypre) versus smoothed (or > > plain) aggregation (ML, GAMG default)? > > > > Brian Van Straalen <[email protected]> writes: > > > > > So Baky and I have been at the Brookhaven GPU Hackathon now for three > days, > > > talking to everyone. We have also been emailing with people who will > > > respond to us from the hypre team and the PETSc team, as well as > reading > > > every blog post and mail archive and message board and from what we can > > > tell, a distributed AMG preconditioner will not be available for us on > a > > > Summit platform for the foreseeable future. > > > > > > There is a hypre build for CUDA, but it has a problem with it's use of > > > CUSP, and nobody seems to be working on it. > > > > > > PETSc has some .cu cuda files for the SpMV and Vector operations but > the > > > preconditioners are limited to point Jacobi and similar simple > operations > > > and a version of ILU. Neither works for our stiff projection in the > > > embedded boundary algorithms. We built it and ran it and PETSc takes > > > several hundred iterations to get the residual down by a factor of 6. > We > > > need to get down to more like 10e-11 for this solve. > > > > > > The AMG being worked on by the NVIDIA team is not targeted for > multi-node > > > solving, and I haven't heard back from them in months. > > > > > > We are left with two options as I see it to meet our ECP Milestones: > > > > > > 1. Build yet another interface, this time to see if there is a > distributed > > > GPU AMG preconditioner in Trilinos > > > > > > 2. Implement our own special-purpose EB-GMG solver written in Chombo. > > > > > > I would love to be wrong about all this. > > > > > > Brian > > > > > > -- > > > Brian Van Straalen Lawrence Berkeley Lab > > > [email protected] Computational Research > > > (510) 486-4976 Division (crd.lbl.gov) > >
