On Tue, Apr 22, 2014 at 7:59 AM, Niklas Fischer <[email protected]> wrote:
> I should probably note that everything is fine if I run the serial > version of this (with the exact same matrix + right hand side). > > PETSc KSPSolve done, residual norm: 3.13459e-13, it took 6 iterations. > Yes, your preconditioner is weaker in parallel since it is block Jacobi. If you just want to solve the problem, use a parallel sparse direct factorization, like SuperLU_dist or MUMPS. You reconfigure using --download-superlu-dist or --download-mumps, and then use -pc_type lu -pc_factor_mat_solver_package mumps If you want a really scalable solution, then you have to know about your operator, not just the discretization. Matt > Am 22.04.2014 14:12, schrieb Niklas Fischer: > > > Am 22.04.2014 13:57, schrieb Matthew Knepley: > > On Tue, Apr 22, 2014 at 6:48 AM, Niklas Fischer <[email protected]>wrote: > >> Am 22.04.2014 13:08, schrieb Jed Brown: >> >>> Niklas Fischer <[email protected]> writes: >>> >>> Hello, >>>> >>>> I have attached a small test case for a problem I am experiencing. What >>>> this dummy program does is it reads a vector and a matrix from a text >>>> file and then solves Ax=b. The same data is available in two forms: >>>> - everything is in one file (matops.s.0 and vops.s.0) >>>> - the matrix and vector are split between processes (matops.0, >>>> matops.1, vops.0, vops.1) >>>> >>>> The serial version of the program works perfectly fine but unfortunately >>>> errors occure, when running the parallel version: >>>> >>>> make && mpirun -n 2 a.out matops vops >>>> >>>> mpic++ -DPETSC_CLANGUAGE_CXX -isystem >>>> /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/include -isystem >>>> /home/data/fischer/libs/petsc-3.4.3/include petsctest.cpp -Werror -Wall >>>> -Wpedantic -std=c++11 -L >>>> /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/lib -lpetsc >>>> /usr/bin/ld: warning: libmpi_cxx.so.0, needed by >>>> /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/lib/libpetsc.so, >>>> may conflict with libmpi_cxx.so.1 >>>> /usr/bin/ld: warning: libmpi.so.0, needed by >>>> /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/lib/libpetsc.so, >>>> may conflict with libmpi.so.1 >>>> librdmacm: couldn't read ABI version. >>>> librdmacm: assuming: 4 >>>> CMA: unable to get RDMA device list >>>> >>>> -------------------------------------------------------------------------- >>>> [[43019,1],0]: A high-performance Open MPI point-to-point messaging >>>> module >>>> was unable to find any relevant network interfaces: >>>> >>>> Module: OpenFabrics (openib) >>>> Host: dornroeschen.igpm.rwth-aachen.de >>>> CMA: unable to get RDMA device list >>>> >>> It looks like your MPI is either broken or some of the code linked into >>> your application was compiled with a different MPI or different version. >>> Make sure you can compile and run simple MPI programs in parallel. >>> >> Hello Jed, >> >> thank you for your inputs. Unfortunately MPI does not seem to be the >> issue here. The attachment contains a simple MPI hello world program which >> runs flawlessly (I will append the output to this mail) and I have not >> encountered any problems with other MPI programs. My question still stands. >> > > This is a simple error. You created the matrix A using PETSC_COMM_WORLD, > but you try to view it > using PETSC_VIEWER_STDOUT_SELF. You need to use PETSC_VIEWER_STDOUT_WORLD > in > order to match. > > Thanks, > > Matt > > >> Greetings, >> Niklas Fischer >> >> mpirun -np 2 ./mpitest >> >> librdmacm: couldn't read ABI version. >> librdmacm: assuming: 4 >> CMA: unable to get RDMA device list >> -------------------------------------------------------------------------- >> [[44086,1],0]: A high-performance Open MPI point-to-point messaging module >> was unable to find any relevant network interfaces: >> >> Module: OpenFabrics (openib) >> Host: dornroeschen.igpm.rwth-aachen.de >> >> Another transport will be used instead, although this may result in >> lower performance. >> -------------------------------------------------------------------------- >> librdmacm: couldn't read ABI version. >> librdmacm: assuming: 4 >> CMA: unable to get RDMA device list >> Hello world from processor dornroeschen.igpm.rwth-aachen.de, rank 0 out >> of 2 processors >> Hello world from processor dornroeschen.igpm.rwth-aachen.de, rank 1 out >> of 2 processors >> [dornroeschen.igpm.rwth-aachen.de:128141] 1 more process has sent help >> message help-mpi-btl-base.txt / btl:no-nics >> [dornroeschen.igpm.rwth-aachen.de:128141] Set MCA parameter >> "orte_base_help_aggregate" to 0 to see all help / error messages >> > > Thank you, Matthew, this solves my viewing problem. Am I doing > something wrong when initializing the matrices as well? The matrix' viewing > output starts with "Matrix Object: 1 MPI processes" and the Krylov solver > does not converge. > > Your help is really appreciated, > Niklas Fischer > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
