We need all the information from -log_summary to see what is going on. Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
Barry On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: > Here's the performance statistic on 1 and 2 processor runs. > > /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary > > Max Max/Min Avg Total > Time (sec): 8.452e+00 1.00000 8.452e+00 > Objects: 1.470e+02 1.00000 1.470e+02 > Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 > Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 4.440e+02 1.00000 > > /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary > > Max Max/Min Avg Total > Time (sec): 7.851e+00 1.00000 7.851e+00 > Objects: 2.000e+02 1.00000 2.000e+02 > Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 > Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 > MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 > MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 > MPI Reductions: 1.046e+03 1.00000 > > I am not entirely sure if I can make sense out of that statistic but > if there is something more you need, please feel free to let me know. > > Vijay > > On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote: >> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com> >> wrote: >>> >>> Matt, >>> >>> The -with-debugging=1 option is certainly not meant for performance >>> studies but I didn't expect it to yield the same cpu time as a single >>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>> approximately the same amount of time for computation of solution. But >>> I am currently configuring without debugging symbols and shall let you >>> know what that yields. >>> >>> On a similar note, is there something extra that needs to be done to >>> make use of multi-core machines while using MPI ? I am not sure if >>> this is even related to PETSc but could be an MPI configuration option >>> that maybe either I or the configure process is missing. All ideas are >>> much appreciated. >> >> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >> cheap multicore machines, there is a single memory bus, and thus using more >> cores gains you very little extra performance. I still suspect you are not >> actually >> running in parallel, because you usually see a small speedup. That is why I >> suggested looking at -log_summary since it tells you how many processes were >> run and breaks down the time. >> Matt >> >>> >>> Vijay >>> >>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> >>> wrote: >>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am trying to configure my petsc install with an MPI installation to >>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>> eventhough the configure/make process went through without problems, >>>>> the scalability of the programs don't seem to reflect what I expected. >>>>> My configure options are >>>>> >>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>> --with-debugging=1 --with-errorchecking=yes >>>> >>>> 1) For performance studies, make a build using --with-debugging=0 >>>> 2) Look at -log_summary for a breakdown of performance >>>> Matt >>>> >>>>> >>>>> Is there something else that needs to be done as part of the configure >>>>> process to enable a decent scaling ? I am only comparing programs with >>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>> -grid parameter from command-line to control the number of unknowns. >>>>> >>>>> If there is something you've witnessed before in this configuration or >>>>> if you need anything else to analyze the problem, do let me know. >>>>> >>>>> Thanks, >>>>> Vijay >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments >>>> is infinitely more interesting than any results to which their >>>> experiments >>>> lead. >>>> -- Norbert Wiener >>>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >>
