[petsc-dev] examples/benchmarks for weak and strong scaling exercise
On Fri, Apr 12, 2013 at 9:18 AM, Chris Kees cekees at gmail.com wrote: I updated the results for the Bratu problem on our SGI. It has 8 cores per node (two 4-core processors per node), and I ran from 1 to 256 cores. The log_summary output is attached for both studies. Question: Strong scaling: This looks fine. You get the classic memory bandwidth starvation after 2 cores on the same node (although your scaling does not completely bottom out), and among nodes the scaling is great. Weak scaling: I have to go through the logs, but obviously something is wrong. I am betting it is the failure to increase the GMG levels with increasing problem size. is there anything about the memory usage of that problem that doesn't scale? The memory usage looks steady at 1GB per core based on log_summary. I ask because last night I tried to do one more level of refinement for weak scaling on 1024 cores and it crashed. I ran the same job on 512 cores this morning, and it ran fine so I'm hoping the issue was a temporary system problem. No, the memory usage is scalable. Thanks, Matt Notes: There is a shift in the strong scaling curve as it fills up the first node (i.e. from 1 to 16 cores), then it looks perfect. The shift seems reasonable due to the sharing of the cache by 4 cores. The weak scaling shows slight growth in the wall clock from 6.3 seconds to 17 seconds. I'm going to run that again with a larger coarse grid in order to increase the runtime to several minutes. Graphs: https://proteus.usace.army.mil/home/pub/17/ On Thu, Apr 11, 2013 at 12:46 PM, Jed Brown jedbrown at mcs.anl.gov wrote: Chris Kees cekees at gmail.com writes: Thanks a lot. I did a little example with the Bratu problem and posted it here: https://proteus.usace.army.mil/home/pub/17/ I used boomeramg instead of geometric multigrid because I was getting an error with the options above: %mpiexec -np 4 ./ex5 -mx 129 -my 129 -Nx 2 -Ny 2 -pc_type mg -pc_mg_levels 2 [0]PETSC ERROR: - Error Message [0]PETSC ERROR: Argument out of range! [0]PETSC ERROR: New nonzero at (66,1) caused a malloc! [0]PETSC ERROR: That test hard-codes evil things (presumably for testing purposes, though maybe the functionality has been subsumed). Please use src/snes/examples/tutorials/ex5.c instead. mpiexec -n 4 ./ex5 -da_grid_x 65 -da_grid_y 65 -pc_type mg -log_summary -da_refine 1 Increase '-da_refine 1' to get higher resolution. (This will increase the number of MG levels used by PCMG.) Switch '-da_refine 1' to '-snes_grid_sequence 1' if you want FMG, but note that it's trickier to profile because proportionately more time is spent in coarse levels (although the total solve time is lower). I like the ice paper and will try to get the contractor started on reproducing those results. -Chris On Wed, Apr 10, 2013 at 1:13 PM, Nystrom, William D wdn at lanl.gov wrote: Sorry. I overlooked that the URL was using git protocol. My bad. Dave From: Jed Brown [five9a2 at gmail.com] on behalf of Jed Brown [ jedbrown at mcs.anl.gov] Sent: Wednesday, April 10, 2013 12:10 PM To: Nystrom, William D; For users of the development version of PETSc; Chris Kees Subject: Re: [petsc-dev] examples/benchmarks for weak and strong scaling exercise Nystrom, William D wdn at lanl.gov writes: Jed, I tried cloning your tme-ice git repo as follows and it failed: % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice Cloning into 'tme_ice'... fatal: unable to connect to github.com: github.com[0: 204.232.175.90]: errno=Connection timed out I'm doing this from an xterm that allows me to clone petsc just fine. You're using https or ssh to clone PETSc, but the git:// to clone tme-ice. The LANL network is blocking that port, so just use the https or ssh protocol. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130412/4530ceab/attachment.html
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
Thanks a lot. I did a little example with the Bratu problem and posted it here: https://proteus.usace.army.mil/home/pub/17/ I used boomeramg instead of geometric multigrid because I was getting an error with the options above: %mpiexec -np 4 ./ex5 -mx 129 -my 129 -Nx 2 -Ny 2 -pc_type mg -pc_mg_levels 2 [0]PETSC ERROR: - Error Message [0]PETSC ERROR: Argument out of range! [0]PETSC ERROR: New nonzero at (66,1) caused a malloc! [0]PETSC ERROR: I like the ice paper and will try to get the contractor started on reproducing those results. -Chris On Wed, Apr 10, 2013 at 1:13 PM, Nystrom, William D wdn at lanl.gov wrote: Sorry. I overlooked that the URL was using git protocol. My bad. Dave From: Jed Brown [five9a2 at gmail.com] on behalf of Jed Brown [jedbrown at mcs.anl.gov] Sent: Wednesday, April 10, 2013 12:10 PM To: Nystrom, William D; For users of the development version of PETSc; Chris Kees Subject: Re: [petsc-dev] examples/benchmarks for weak and strong scaling exercise Nystrom, William D wdn at lanl.gov writes: Jed, I tried cloning your tme-ice git repo as follows and it failed: % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice Cloning into 'tme_ice'... fatal: unable to connect to github.com: github.com[0: 204.232.175.90]: errno=Connection timed out I'm doing this from an xterm that allows me to clone petsc just fine. You're using https or ssh to clone PETSc, but the git:// to clone tme-ice. The LANL network is blocking that port, so just use the https or ssh protocol.
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
What is going on with those results? In both cases the first parallel code is seriously outperforming the single-core. I'd be interested in seeing the two log_summary outputs. I can only assume that the PETSc developers have an (if (serial); sleep(10);) buried somewhere in the src from their last Gordon Bell run :) A On Thu, Apr 11, 2013 at 6:33 PM, Chris Kees cekees at gmail.com wrote: Thanks a lot. I did a little example with the Bratu problem and posted it here: https://proteus.usace.army.mil/home/pub/17/ I used boomeramg instead of geometric multigrid because I was getting an error with the options above: %mpiexec -np 4 ./ex5 -mx 129 -my 129 -Nx 2 -Ny 2 -pc_type mg -pc_mg_levels 2 [0]PETSC ERROR: - Error Message [0]PETSC ERROR: Argument out of range! [0]PETSC ERROR: New nonzero at (66,1) caused a malloc! [0]PETSC ERROR: I like the ice paper and will try to get the contractor started on reproducing those results. -Chris On Wed, Apr 10, 2013 at 1:13 PM, Nystrom, William D wdn at lanl.gov wrote: Sorry. I overlooked that the URL was using git protocol. My bad. Dave From: Jed Brown [five9a2 at gmail.com] on behalf of Jed Brown [ jedbrown at mcs.anl.gov] Sent: Wednesday, April 10, 2013 12:10 PM To: Nystrom, William D; For users of the development version of PETSc; Chris Kees Subject: Re: [petsc-dev] examples/benchmarks for weak and strong scaling exercise Nystrom, William D wdn at lanl.gov writes: Jed, I tried cloning your tme-ice git repo as follows and it failed: % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice Cloning into 'tme_ice'... fatal: unable to connect to github.com: github.com[0: 204.232.175.90]: errno=Connection timed out I'm doing this from an xterm that allows me to clone petsc just fine. You're using https or ssh to clone PETSc, but the git:// to clone tme-ice. The LANL network is blocking that port, so just use the https or ssh protocol. -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130411/59765d35/attachment.html
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
Chris Kees cekees at gmail.com writes: Thanks a lot. I did a little example with the Bratu problem and posted it here: https://proteus.usace.army.mil/home/pub/17/ I used boomeramg instead of geometric multigrid because I was getting an error with the options above: %mpiexec -np 4 ./ex5 -mx 129 -my 129 -Nx 2 -Ny 2 -pc_type mg -pc_mg_levels 2 [0]PETSC ERROR: - Error Message [0]PETSC ERROR: Argument out of range! [0]PETSC ERROR: New nonzero at (66,1) caused a malloc! [0]PETSC ERROR: That test hard-codes evil things (presumably for testing purposes, though maybe the functionality has been subsumed). Please use src/snes/examples/tutorials/ex5.c instead. mpiexec -n 4 ./ex5 -da_grid_x 65 -da_grid_y 65 -pc_type mg -log_summary -da_refine 1 Increase '-da_refine 1' to get higher resolution. (This will increase the number of MG levels used by PCMG.) Switch '-da_refine 1' to '-snes_grid_sequence 1' if you want FMG, but note that it's trickier to profile because proportionately more time is spent in coarse levels (although the total solve time is lower). I like the ice paper and will try to get the contractor started on reproducing those results. -Chris On Wed, Apr 10, 2013 at 1:13 PM, Nystrom, William D wdn at lanl.gov wrote: Sorry. I overlooked that the URL was using git protocol. My bad. Dave From: Jed Brown [five9a2 at gmail.com] on behalf of Jed Brown [jedbrown at mcs.anl.gov] Sent: Wednesday, April 10, 2013 12:10 PM To: Nystrom, William D; For users of the development version of PETSc; Chris Kees Subject: Re: [petsc-dev] examples/benchmarks for weak and strong scaling exercise Nystrom, William D wdn at lanl.gov writes: Jed, I tried cloning your tme-ice git repo as follows and it failed: % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice Cloning into 'tme_ice'... fatal: unable to connect to github.com: github.com[0: 204.232.175.90]: errno=Connection timed out I'm doing this from an xterm that allows me to clone petsc just fine. You're using https or ssh to clone PETSc, but the git:// to clone tme-ice. The LANL network is blocking that port, so just use the https or ssh protocol.
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
On Wed, Apr 10, 2013 at 10:04 AM, Chris Kees cekees at gmail.com wrote: Hi guys, Could somebody point me to some examples you guys routinely use for weak and strong scaling studies (maybe even with scripts, option files, or prior results on recent hardware)? I'm thinking of 3D Poisson with finite differences and geometric multigrid or something like that. I am trying to write this stuff down, but there is not much written right now. Let's start at the beginning. SNES ex5 is the simplest example that can be used for this I think. It is 2D Poisson (actually Bratu). Its really easy to get weak scaling by adjusting the grid size using -da_grid_x m -da_grid_y n You can turn on MG using -pc_type mg -pc_mg_levels n although the slightly non-intuitive thing is that then the grid size you input is for the coarse grid. We've been trying to work toward scaling studies of the field split and Schur complement preconditioners for our multiphase flow solvers, but I'm realizing that we need to do more thorough testing of the petsc installation itself and make sure we're using timing/profiling best practices and such. We are using petsc-dev on the hardware below. I promise to quit using petsc-dev as soon as the next release comes out:) Several versions of PETSc are also installed by the system maintainers, but my sense is that there is very little testing done on any of the installations. I think using petsc-dev is the right thing, and it is now much much more stable (the 'master' branch on BitBucket). Thanks, Matt http://www.erdc.hpc.mil/hardware/index.html Chris -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130410/ec6a04c9/attachment.html
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
Chris Kees cekees at gmail.com writes: Hi guys, Could somebody point me to some examples you guys routinely use for weak and strong scaling studies (maybe even with scripts, option files, or prior results on recent hardware)? I'm thinking of 3D Poisson with finite differences and geometric multigrid or something like that. One option would be to use src/snes/examples/tutorials/ex48.c and use the configurations from http://dx.doi.org/10.1137/110834512 (http://59A2.org/files/hstat.pdf) which you can find in the paper repository: https://github.com/jedbrown/tme-ice Look in shaheen/b/. Those runs were using DMMG so the command line will have to be modified slightly, but it should be straightforward and you can compare to the runex48_* targets in src/snes/examples/tutorials/makefile. We've been trying to work toward scaling studies of the field split and Schur complement preconditioners for our multiphase flow solvers, but I'm realizing that we need to do more thorough testing of the petsc installation itself and make sure we're using timing/profiling best practices and such. We are using petsc-dev on the hardware below. I promise to quit using petsc-dev as soon as the next release comes out:) We're actually happy to have people using petsc-dev. One motivation for our new workflow is that we can now provide a pretty stable 'master' so that we can interact with users on new features without the latency of a release cycle and without frequent breakage.
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
Jed, I tried cloning your tme-ice git repo as follows and it failed: % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice Cloning into 'tme_ice'... fatal: unable to connect to github.com: github.com[0: 204.232.175.90]: errno=Connection timed out I'm doing this from an xterm that allows me to clone petsc just fine. Any idea what the problem might be? Dave From: petsc-dev-bounces at mcs.anl.gov [petsc-dev-bounces at mcs.anl.gov] on behalf of Jed Brown [jedbr...@mcs.anl.gov] Sent: Wednesday, April 10, 2013 9:22 AM To: Chris Kees; petsc-dev at mcs.anl.gov Subject: Re: [petsc-dev] examples/benchmarks for weak and strong scaling exercise Chris Kees cekees at gmail.com writes: Hi guys, Could somebody point me to some examples you guys routinely use for weak and strong scaling studies (maybe even with scripts, option files, or prior results on recent hardware)? I'm thinking of 3D Poisson with finite differences and geometric multigrid or something like that. One option would be to use src/snes/examples/tutorials/ex48.c and use the configurations from http://dx.doi.org/10.1137/110834512 (http://59A2.org/files/hstat.pdf) which you can find in the paper repository: https://github.com/jedbrown/tme-ice Look in shaheen/b/. Those runs were using DMMG so the command line will have to be modified slightly, but it should be straightforward and you can compare to the runex48_* targets in src/snes/examples/tutorials/makefile. We've been trying to work toward scaling studies of the field split and Schur complement preconditioners for our multiphase flow solvers, but I'm realizing that we need to do more thorough testing of the petsc installation itself and make sure we're using timing/profiling best practices and such. We are using petsc-dev on the hardware below. I promise to quit using petsc-dev as soon as the next release comes out:) We're actually happy to have people using petsc-dev. One motivation for our new workflow is that we can now provide a pretty stable 'master' so that we can interact with users on new features without the latency of a release cycle and without frequent breakage.
[petsc-dev] examples/benchmarks for weak and strong scaling exercise
Sorry. I overlooked that the URL was using git protocol. My bad. Dave From: Jed Brown [five9a2 at gmail.com] on behalf of Jed Brown [jedbr...@mcs.anl.gov] Sent: Wednesday, April 10, 2013 12:10 PM To: Nystrom, William D; For users of the development version of PETSc; Chris Kees Subject: Re: [petsc-dev] examples/benchmarks for weak and strong scaling exercise Nystrom, William D wdn at lanl.gov writes: Jed, I tried cloning your tme-ice git repo as follows and it failed: % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice Cloning into 'tme_ice'... fatal: unable to connect to github.com: github.com[0: 204.232.175.90]: errno=Connection timed out I'm doing this from an xterm that allows me to clone petsc just fine. You're using https or ssh to clone PETSc, but the git:// to clone tme-ice. The LANL network is blocking that port, so just use the https or ssh protocol.