It was sort of arbitrary. I want to conduct a performance spectrum (dofs/sec) study where at least 1k processors are used on various HPC machines (and hopefully one more case with 10k procs). Assuming all available cores on these compute nodes (which I know is not the greatest idea here), 1032 Ivybridge (24 cores/node) on Edison best matches Cori's 1024 Haswell (32 core/node).
How do I determine the shape of the DMDA? I am guessing the number of MPI processes needs to be compatible with this? Thanks, Justin On Sun, Apr 2, 2017 at 11:29 AM, Jed Brown <[email protected]> wrote: > Justin Chang <[email protected]> writes: > > > Thanks guys, > > > > So I want to run SNES ex48 across 1032 processes on Edison, > > How did you decide on 1032 processes? What shape did the DMDA produce? > Of course this should work, but we didn't explicitly test that in the > paper since we were running on BG/P. > > https://github.com/jedbrown/tme-ice/tree/master/shaheen/b > > > but I keep getting segmentation violations. These are the parameters I > > am trying: > > > > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg > > -thi_mat_type baij -mg_coarse_pc_type gamg > > > > The above works perfectly fine if I used 96 processes. I also tried to > use > > a finer coarse mesh on 1032 but the error persists. > > > > Any ideas why this is happening? What are the ideal parameters to use if > I > > want to use 1k+ cores? > > > > Thanks, > > Justin > > > > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith <[email protected]> > wrote: > > > >> > >> > On Mar 31, 2017, at 10:00 AM, Jed Brown <[email protected]> wrote: > >> > > >> > Justin Chang <[email protected]> writes: > >> > > >> >> Yeah based on my experiments it seems setting pc_mg_levels to > $DAREFINE > >> + 1 > >> >> has decent performance. > >> >> > >> >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In > >> some of > >> >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ > >> >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide > 203/227) > >> >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran > this, it > >> >> was almost twice as slow as if $MGLEVELS >= $DAREFINE > >> > > >> > Smaller coarse grids are generally more scalable -- when the problem > >> > data is distributed, multigrid is a good solution algorithm. But if > >> > multigrid stops being effective because it is not preserving > sufficient > >> > coarse grid accuracy (e.g., for transport-dominated problems in > >> > complicated domains) then you might want to stop early and use a more > >> > robust method (like direct solves). > >> > >> Basically for symmetric positive definite operators you can make the > >> coarse problem as small as you like (even 1 point) in theory. For > >> indefinite and non-symmetric problems the theory says the "coarse grid > must > >> be sufficiently fine" (loosely speaking the coarse grid has to resolve > the > >> eigenmodes for the eigenvalues to the left of the x = 0). > >> > >> https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents > >> > >> > >> >
