Ok many thanks Barry, For the cpu:sockets binding i get an ugly error:
[valera@ocean petsc]$ make streams NPMAX=4 MPI_BINDING="--binding cpu:sockets" cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug streams /home/valera/petsc/arch-linux2-c-debug/bin/mpicc -o MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include `pwd`/MPIVersion.c Running streams with '/home/valera/petsc/arch-linux2-c-debug/bin/mpiexec --binding cpu:sockets' using 'NPMAX=4' [proxy:0:0@ocean] handle_bitmap_binding (tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string "cpu:sockets" [proxy:0:0@ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415): error binding with bind "cpu:sockets" and map "(null)" [proxy:0:0@ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to initialize hwloc [proxy:0:0@ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to initialize process topology [proxy:0:0@ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892): launch_procs returned error [proxy:0:0@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event [mpiexec@ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed [mpiexec@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [mpiexec@ocean] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event [mpiexec@ocean] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion [proxy:0:0@ocean] handle_bitmap_binding (tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string "cpu:sockets" [proxy:0:0@ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415): error binding with bind "cpu:sockets" and map "(null)" [proxy:0:0@ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to initialize hwloc [proxy:0:0@ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to initialize process topology [proxy:0:0@ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892): launch_procs returned error [proxy:0:0@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event [mpiexec@ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed [mpiexec@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [mpiexec@ocean] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event [mpiexec@ocean] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion [proxy:0:0@ocean] handle_bitmap_binding (tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string "cpu:sockets" [proxy:0:0@ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415): error binding with bind "cpu:sockets" and map "(null)" [proxy:0:0@ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to initialize hwloc [proxy:0:0@ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to initialize process topology [proxy:0:0@ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892): launch_procs returned error [proxy:0:0@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event [mpiexec@ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed [mpiexec@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [mpiexec@ocean] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event [mpiexec@ocean] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion [proxy:0:0@ocean] handle_bitmap_binding (tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string "cpu:sockets" [proxy:0:0@ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415): error binding with bind "cpu:sockets" and map "(null)" [proxy:0:0@ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to initialize hwloc [proxy:0:0@ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to initialize process topology [proxy:0:0@ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892): launch_procs returned error [proxy:0:0@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event [mpiexec@ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed [mpiexec@ocean] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [mpiexec@ocean] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event [mpiexec@ocean] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion ------------------------------------------------ Im sending the binary file for the other list in a separate mail next, Regards, On Sun, Jan 8, 2017 at 4:05 PM, Barry Smith <[email protected]> wrote: > > Manuel, > > Ok there are two (actually 3) distinct things you need to deal with > to get get any kind of performance out of this machine. > > 0) When running on the machine you cannot share it with other peoples jobs > or you will get timings all over the place so run streams and benchmarks of > your code when no one else has jobs running (The Unix top command helps) > > 1) mpiexec is making bad decisions about process binding (what MPI > processes are bound/assigned to what MPI cores). > > From streams you have > > np speedup > 1 1.0 > 2 1.95 > 3 0.57 > 4 0.6 > 5 2.79 > 6 2.8 > 7 2.74 > 8 2.67 > 9 2.55 > 10 2.68 > ..... > > This is nuts. When going from 2 to 3 processes the performance goes WAY > down. If the machine is empty and MPI did a good assignment of processes to > cores the speedup should not go down for more cores it should just stagnate. > > So you need to find out how to do process binding with MPI see > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers and the > links from there. You can run the streams test with binding by for example > make streams NPMAX=4 MPI_BINDING="--binding cpu:sockets". > > Once you have a good binding for your MPI make sure you always run the > mpiexec with that binding when running your code. > > 2) Both preconditioners you have tried for your problem are terrible. With > block Jacobi it went from 156 linear iterations (for 5 linear solves) to > 546 iterations. With AMG it went from 1463!! iterations to 1760. These are > huge numbers of iterations for algebraic multigrid! > > For some reason AMG doesn't like your pressure matrix (even though AMG > generally loves pressure matrices). What do you have for boundary > conditions for your pressure? > > Please run with -ksp_view_mat binary -ksp_view_rhs binary and then send > the resulting file binaryoutput to [email protected] and we'll see > if we can figure out why AMG doesn't like it. > > > > > > > > > On Jan 8, 2017, at 4:41 PM, Manuel Valera <[email protected]> wrote: > > > > Ok, i just did the streams and log_summary tests, im attaching the > output for each run, with NPMAX=4 and NPMAX=32, also -log_summary runs with > -pc_type hypre and without it, with 1 and 2 cores, all of this with > debugging turned off. > > > > The matrix is 200,000x200,000, full curvilinear 3d meshes, > non-hydrostatic pressure solver. > > > > Thanks a lot for your insight, > > > > Manuel > > > > On Sun, Jan 8, 2017 at 9:48 AM, Barry Smith <[email protected]> wrote: > > > > we need to see the -log_summary with hypre on 1 and 2 processes (with > debugging tuned off) also we need to see the output from > > > > make stream NPMAX=4 > > > > run in the PETSc directory. > > > > > > > > > On Jan 7, 2017, at 7:38 PM, Manuel Valera <[email protected]> > wrote: > > > > > > Ok great, i tried those command line args and this is the result: > > > > > > when i use -pc_type gamg: > > > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [1]PETSC ERROR: Petsc has generated inconsistent data > > > [1]PETSC ERROR: Have un-symmetric graph (apparently). Use > '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold > -1.0' if the matrix is structurally symmetric. > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > [1]PETSC ERROR: Petsc Release Version 3.7.4, unknown > > > [1]PETSC ERROR: ./ucmsMR on a arch-linux2-c-debug named ocean by > valera Sat Jan 7 17:35:05 2017 > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 > --download-netcdf --download-hypre --download-metis --download-parmetis > --download-trillinos > > > [1]PETSC ERROR: #1 smoothAggs() line 462 in > /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c > > > [1]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in > /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c > > > [1]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in > /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c > > > [1]PETSC ERROR: #4 PCSetUp() line 968 in /usr/dataC/home/valera/petsc/ > src/ksp/pc/interface/precon.c > > > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /usr/dataC/home/valera/petsc/ > src/ksp/ksp/interface/itfunc.c > > > application called MPI_Abort(comm=0x84000002, 77) - process 1 > > > > > > > > > when i use -pc_type gamg and -pc_gamg_sym_graph true: > > > > > > ------------------------------------------------------------ > ------------ > > > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point > Exception,probably divide by zero > > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > > > [1]PETSC ERROR: ------------------------------ > ------------------------------------------ > > > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > > [1]PETSC ERROR: INSTEAD the line number of the start of the > function > > > [1]PETSC ERROR: is given. > > > [1]PETSC ERROR: [1] LAPACKgesvd line 42 /usr/dataC/home/valera/petsc/ > src/ksp/ksp/impls/gmres/gmreig.c > > > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues_GMRES line 24 > /usr/dataC/home/valera/petsc/src/ksp/ksp/impls/gmres/gmreig.c > > > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues line 51 > /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c > > > [1]PETSC ERROR: [1] PCGAMGOptProlongator_AGG line 1187 > /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c > > > [1]PETSC ERROR: [1] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/ > src/ksp/pc/impls/gamg/gamg.c > > > [1]PETSC ERROR: [1] PCSetUp line 930 /usr/dataC/home/valera/petsc/ > src/ksp/pc/interface/precon.c > > > [1]PETSC ERROR: [1] KSPSetUp line 305 /usr/dataC/home/valera/petsc/ > src/ksp/ksp/interface/itfunc.c > > > [0] PCGAMGOptProlongator_AGG line 1187 /usr/dataC/home/valera/petsc/ > src/ksp/pc/impls/gamg/agg.c > > > [0]PETSC ERROR: [0] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/ > src/ksp/pc/impls/gamg/gamg.c > > > [0]PETSC ERROR: [0] PCSetUp line 930 /usr/dataC/home/valera/petsc/ > src/ksp/pc/interface/precon.c > > > [0]PETSC ERROR: [0] KSPSetUp line 305 /usr/dataC/home/valera/petsc/ > src/ksp/ksp/interface/itfunc.c > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > > > when i use -pc_type hypre it actually shows something different on > -ksp_view : > > > > > > KSP Object: 2 MPI processes > > > type: gcr > > > GCR: restart = 30 > > > GCR: restarts performed = 37 > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-14, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 2 MPI processes > > > type: hypre > > > HYPRE BoomerAMG preconditioning > > > HYPRE BoomerAMG: Cycle type V > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0. > > > HYPRE BoomerAMG: Threshold for strong coupling 0.25 > > > HYPRE BoomerAMG: Interpolation truncation factor 0. > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > HYPRE BoomerAMG: Sweeps down 1 > > > HYPRE BoomerAMG: Sweeps up 1 > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > HYPRE BoomerAMG: Relax weight (all) 1. > > > HYPRE BoomerAMG: Outer relax weight (all) 1. > > > HYPRE BoomerAMG: Using CF-relaxation > > > HYPRE BoomerAMG: Not using more complex smoothers. > > > HYPRE BoomerAMG: Measure type local > > > HYPRE BoomerAMG: Coarsen type Falgout > > > HYPRE BoomerAMG: Interpolation type classical > > > HYPRE BoomerAMG: Using nodal coarsening (with > HYPRE_BOOMERAMGSetNodal() 1 > > > HYPRE BoomerAMG: HYPRE_BoomerAMGSetInterpVecVariant() 1 > > > linear system matrix = precond matrix: > > > Mat Object: 2 MPI processes > > > type: mpiaij > > > rows=200000, cols=200000 > > > total: nonzeros=3373340, allocated nonzeros=3373340 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > > > > but still the timing is terrible. > > > > > > > > > > > > > > > On Sat, Jan 7, 2017 at 5:28 PM, Jed Brown <[email protected]> wrote: > > > Manuel Valera <[email protected]> writes: > > > > > > > Awesome Matt and Jed, > > > > > > > > The GCR is used because the matrix is not invertible and because > this was > > > > the algorithm that the previous library used, > > > > > > > > The Preconditioned im aiming to use is multigrid, i thought i > configured > > > > the hypre-boomerAmg solver for this, but i agree in that it doesn't > show in > > > > the log anywhere, how can i be sure is being used ? i sent -ksp_view > log > > > > before in this thread > > > > > > Did you run with -pc_type hypre? > > > > > > > I had a problem with the matrix block sizes so i couldn't make the > petsc > > > > native multigrid solver to work, > > > > > > What block sizes? If the only variable is pressure, the block size > > > would be 1 (default). > > > > > > > This is a nonhidrostatic pressure solver, it is an elliptic problem > so > > > > multigrid is a must, > > > > > > Yes, multigrid should work well. > > > > > > > > > <logsumm1hypre.txt><logsumm1jacobi.txt><logsumm2hypre.txt>< > logsumm2jacobi.txt><steams4.txt><steams32.txt> > >
