I did not configure hypre manually, so I guess it is not using GPUs. On Fri, Nov 2, 2018 at 2:40 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> > > > On Nov 2, 2018, at 1:25 PM, Mark Adams <mfad...@lbl.gov> wrote: > > > > And I just tested it with GAMG and it seems fine. And hypre ran, but it > is not clear that it used GPUs.... > > Presumably hyper must be configured to use GPUs. Currently the PETSc > hyper download installer hypre.py doesn't have any options for getting > hypre built for GPUs. > > Barry > > > > > 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > 0 SNES Function norm 0.239155 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > Cycle type V > > Maximum number of levels 25 > > Maximum number of iterations PER hypre call 1 > > Convergence tolerance PER hypre call 0. > > Threshold for strong coupling 0.25 > > Interpolation truncation factor 0. > > Interpolation: max elements per row 0 > > Number of levels of aggressive coarsening 0 > > Number of paths for aggressive coarsening 1 > > Maximum row sums 0.9 > > Sweeps down 1 > > Sweeps up 1 > > Sweeps on coarse 1 > > Relax down symmetric-SOR/Jacobi > > Relax up symmetric-SOR/Jacobi > > Relax on coarse Gaussian-elimination > > Relax weight (all) 1. > > Outer relax weight (all) 1. > > Using CF-relaxation > > Not using more complex smoothers. > > Measure type local > > Coarsen type Falgout > > Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaijcusparse > > rows=64, cols=64, bs=4 > > total: nonzeros=1024, allocated nonzeros=1024 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 16 nodes, limit used is 5 > > 1 SNES Function norm 6.80716e-05 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > Cycle type V > > Maximum number of levels 25 > > Maximum number of iterations PER hypre call 1 > > Convergence tolerance PER hypre call 0. > > Threshold for strong coupling 0.25 > > Interpolation truncation factor 0. > > Interpolation: max elements per row 0 > > Number of levels of aggressive coarsening 0 > > Number of paths for aggressive coarsening 1 > > Maximum row sums 0.9 > > Sweeps down 1 > > Sweeps up 1 > > Sweeps on coarse 1 > > Relax down symmetric-SOR/Jacobi > > Relax up symmetric-SOR/Jacobi > > Relax on coarse Gaussian-elimination > > Relax weight (all) 1. > > Outer relax weight (all) 1. > > Using CF-relaxation > > Not using more complex smoothers. > > Measure type local > > Coarsen type Falgout > > Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaijcusparse > > rows=64, cols=64, bs=4 > > total: nonzeros=1024, allocated nonzeros=1024 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 16 nodes, limit used is 5 > > 2 SNES Function norm 4.093e-11 > > Number of SNES iterations = 2 > > > > > > On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. <bsm...@mcs.anl.gov> > wrote: > > > > > > > On Nov 2, 2018, at 1:03 PM, Mark Adams <mfad...@lbl.gov> wrote: > > > > > > FYI, I seem to have the new GPU machine at ORNL (summitdev) working > with GPUs. That is good enough for now. > > > Thanks, > > > > Excellant! > > > > > > > > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > 0 SNES Function norm 0.239155 > > > KSP Object: 1 MPI processes > > > type: fgmres > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 1 MPI processes > > > type: none > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: seqaijcusparse > > > rows=64, cols=64, bs=4 > > > total: nonzeros=1024, allocated nonzeros=1024 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node routines: found 16 nodes, limit used is 5 > > > 1 SNES Function norm 6.82338e-05 > > > KSP Object: 1 MPI processes > > > type: fgmres > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 1 MPI processes > > > type: none > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: seqaijcusparse > > > rows=64, cols=64, bs=4 > > > total: nonzeros=1024, allocated nonzeros=1024 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node routines: found 16 nodes, limit used is 5 > > > 2 SNES Function norm 3.346e-10 > > > Number of SNES iterations = 2 > > > 14:01 master= ~/petsc/src/snes/examples/tutorials$ > > > > > > > > > > > > On Thu, Nov 1, 2018 at 9:33 AM Mark Adams <mfad...@lbl.gov> wrote: > > > > > > > > > On Wed, Oct 31, 2018 at 12:30 PM Mark Adams <mfad...@lbl.gov> wrote: > > > > > > > > > On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp <r...@iue.tuwien.ac.at> > wrote: > > > Hi Mark, > > > > > > ah, I was confused by the Python information at the beginning of > > > configure.log. So it is picking up the correct compiler. > > > > > > Have you tried uncommenting the check for GNU? > > > > > > Yes, but I am getting an error that the cuda files do not find mpi.h. > > > > > > > > > I'm getting a make error. > > > > > > Thanks, > > > >