On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>wrote:
> Matt, > > The -with-debugging=1 option is certainly not meant for performance > studies but I didn't expect it to yield the same cpu time as a single > processor for snes/ex20 i.e., my runs with 1 and 2 processors take > approximately the same amount of time for computation of solution. But > I am currently configuring without debugging symbols and shall let you > know what that yields. > > On a similar note, is there something extra that needs to be done to > make use of multi-core machines while using MPI ? I am not sure if > this is even related to PETSc but could be an MPI configuration option > that maybe either I or the configure process is missing. All ideas are > much appreciated. Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most cheap multicore machines, there is a single memory bus, and thus using more cores gains you very little extra performance. I still suspect you are not actually running in parallel, because you usually see a small speedup. That is why I suggested looking at -log_summary since it tells you how many processes were run and breaks down the time. Matt > > Vijay > > On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote: > > On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com> > > wrote: > >> > >> Hi, > >> > >> I am trying to configure my petsc install with an MPI installation to > >> make use of a dual quad-core desktop system running Ubuntu. But > >> eventhough the configure/make process went through without problems, > >> the scalability of the programs don't seem to reflect what I expected. > >> My configure options are > >> > >> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 > >> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g > >> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 > >> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ > >> --download-plapack=1 --download-mumps=1 --download-umfpack=yes > >> --with-debugging=1 --with-errorchecking=yes > > > > 1) For performance studies, make a build using --with-debugging=0 > > 2) Look at -log_summary for a breakdown of performance > > Matt > > > >> > >> Is there something else that needs to be done as part of the configure > >> process to enable a decent scaling ? I am only comparing programs with > >> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the > >> same time as noted from -log_summary. If it helps, I've been testing > >> with snes/examples/tutorials/ex20.c for all purposes with a custom > >> -grid parameter from command-line to control the number of unknowns. > >> > >> If there is something you've witnessed before in this configuration or > >> if you need anything else to analyze the problem, do let me know. > >> > >> Thanks, > >> Vijay > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/fd92968a/attachment.htm>
