Hmm, I'll look more carefully, but I think your binding is incorrect. numactl -l --cpunodebind=$OMPI_COMM_WORLD_LOCAL_RANK ex2 $KSP_ARGS
NUMA node numbering is different from MPI ranks. On Fri, Feb 24, 2012 at 12:11, Nystrom, William D <wdn at lanl.gov> wrote: > Hi Jed, > > Attached is a gzipped tarball of the stuff I used to run these two test > problems with > numactl. Actually, I hacked them a bit this morning because I was running > them in > our test framework for doing acceptance testing of new systems. But the > scripts in > the tarball should give you all the info you need. There is a top level > script called > runPetsc that just invokes mpirun from openmpi and calls the wrapper > scripts for using > numactl. You could actually dispense with the top level script and just > invoke the > mpirun commands yourself. I include it as an easy way to document what I > did. > The runPetscProb_1 script runs petsc on the gpus using numactl to control > the affinities > of the gpus to the cpu numa nodes. The runPetscProb_2 script runs petsc > on the cpus > using numactl. Note that both of those wrapper scripts are using openmpi > variables. > I'm not sure how one would do the same thing with another flavor of mpi. > But I imagine > it is possible. Also, I'm not sure if there are other more elegant ways > to run with numactl > than using the wrapper script approach. Perhaps there is but this is what > we have been > doing. > > I've also included a Perl script called numa-maps that is useful for > actually checking the > affinities that you get while running in order to make sure that numactl > is doing what > you think it is doing. I'm not sure where this script comes from. I find > it on some systems > and not on others. > > I've also include logs with the output of cpuinfo, nvidia-smi and uname -a > to answer any > questions you had about the system I was running on. > > Finally, I've included runPetscProb_1.log and runPetscProb_2.log which > contains the > log_summary output for my latest runs on the gpu and cpu respectively. > Using numactl > reduced the runtime for the gpu case as well but not as much as for the > cpu case. So > the final result was that running the same problem while using all of the > gpu resources > on a node was about 2.5x times faster than using all of the cpu resources > on the same > number of nodes. > > Let me know if you need more any more info. I'm planning to use this > stuff to help test > a new gpu cluster that we have just started acceptance testing on. It has > the same > basic hardware as the testbed cluster for these results but has 308 > nodes. That > should be interesting and fun. > > > Thanks, > > Dave > > -- > Dave Nystrom > LANL HPC-5 > Phone: 505-667-7913 > Email: wdn at lanl.gov > Smail: Mail Stop B272 > Group HPC-5 > Los Alamos National Laboratory > Los Alamos, NM 87545 > > ------------------------------ > *From:* petsc-dev-bounces at mcs.anl.gov [petsc-dev-bounces at mcs.anl.gov] on > behalf of Jed Brown [jedbrown at mcs.anl.gov] > *Sent:* Thursday, February 23, 2012 10:43 PM > > *To:* For users of the development version of PETSc > *Cc:* Dave Nystrom > > *Subject:* Re: [petsc-dev] Understanding Some Parallel Results with PETSc > > On Thu, Feb 23, 2012 at 23:41, Dave Nystrom <dnystrom1 at comcast.net>wrote: > >> I could also send you my mpi/numactl command lines for gpu and cpu when I >> am >> back in the office. >> > > Yes, please. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120224/7fc3822f/attachment.html>
