> On Apr 12, 2015, at 12:48 PM, Gideon Simpson <[email protected]> wrote:
>
> I was hoping to demonstrate in my class the computational gain with petsc/mpi
> in solving a simple problem, like discretized poisson or heat, as the number
> of processes increases. Can anyone recommend any of the petsc examples for
> this purpose? Perhaps I’m just using poorly chosen KSP/PC pairs, but I
> haven’t been able to observe gain. I’m planning to demo this on a commodity
> intel cluster with infiniband.
Gideon,
I would use src/ksp/ksp/examples/tutorials/ex45 to get across three
concepts
1) algorithmic complexity (1 process). Run it with several levels of
refinement (say -da_refine 4 depending on how much memory you have) with
a) -pc_type jacobi -ksp_type bcgs (algorithm with poor computational
complexity, very parallel)
b) -pc_type mg -ksp_type bcgs (algorithm with good computational
complexity, good parallel but less than jacobi)
then run it again with one more level of refinement (say -da_refine 5)
and see how much more time each method takes
2) scaling (2 process) Run as with 1) but on two processes and note that
the "poorer" algorithm Jacobi gives better "speedup" then mg
3) understanding the limitations of your machine (see
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers) how total memory
bandwidth of all your available cores determines the performance of the PETSc
solvers. So run the streams benchmark (included now with PETSc in the
src/benchmarks/streams directory) to see its speedup when you use a different
number of cores and combinations of cores on each node and across nodes and
then run the PETSc example to see its speedups. Note that you likely have to do
something smarter than mpiexec -n n ./ex45 ... to launch the program since you
need to have control over what nodes the mpi puts each of the processes it
starts up; for example does it spread the processes one on each node or first
pack them on one node (check the documentation for your mpiexec and how to
control this). You will find that different choices lead to very different
performance and this can be related to the streams benchmark and available
memory bandwidth.
Barry
>
> -gideon
>