Hello, I have attached a small test case for a problem I am experiencing. What this dummy program does is it reads a vector and a matrix from a text file and then solves Ax=b. The same data is available in two forms: - everything is in one file (matops.s.0 and vops.s.0) - the matrix and vector are split between processes (matops.0, matops.1, vops.0, vops.1)
The serial version of the program works perfectly fine but unfortunately errors occure, when running the parallel version: make && mpirun -n 2 a.out matops vops mpic++ -DPETSC_CLANGUAGE_CXX -isystem /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/include -isystem /home/data/fischer/libs/petsc-3.4.3/include petsctest.cpp -Werror -Wall -Wpedantic -std=c++11 -L /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/lib -lpetsc /usr/bin/ld: warning: libmpi_cxx.so.0, needed by /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/lib/libpetsc.so, may conflict with libmpi_cxx.so.1 /usr/bin/ld: warning: libmpi.so.0, needed by /home/data/fischer/libs/petsc-3.4.3/arch-linux2-c-debug/lib/libpetsc.so, may conflict with libmpi.so.1 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 CMA: unable to get RDMA device list -------------------------------------------------------------------------- [[43019,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: dornroeschen.igpm.rwth-aachen.de CMA: unable to get RDMA device list -------------------------------------------------------------------------- [[43019,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: dornroeschen.igpm.rwth-aachen.de Another transport will be used instead, although this may result Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- librdmacm: couldn't read ABI version. librdmacm: assuming: 4 CMA: unable to get RDMA device list Matrix size is 32x32 Matrix size is 32x32 Vector size is 32 Vector size is 32 [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Arguments must have same communicators! [1]PETSC ERROR: Different communicators in the two objects: Argument # 1 and 2 flag 3! [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Petsc Release Version 3.4.0, May, 13, 2013 [1]PETSC ERROR: See docs/changes/index.html for recent updates. [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [1]PETSC ERROR: See docs/index.html for manual pages. [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: petscwrap on a arch-linux2-c-debug named dornroeschen.igpm.rwth-aachen.de by fischer Tue Apr 22 11:14:39 2014 [1]PETSC ERROR: Libraries linked from /usr/local/openmpi-4.8.1/lib [1]PETSC ERROR: Configure run at Fri Jun 7 09:32:12 2013 [1]PETSC ERROR: Configure options --with-netcdf=1 --prefix=/usr/local/openmpi-4.8.1 --with-mpi=1 --with-shared-libraries=1 --with-hdf=1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-netcdf-lib=/usr/local/openmpi-4.8.1/lib64/libnetcdf.so --with-metis=1 --with-netcdf-include=/usr/local/openmpi-4.8.1/include --with-metis-dir=/usr/local/openmpi-4.8.1 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --libdir=/usr/local/openmpi-4.8.1/lib64 [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: MatView() line 811 in /local/petsc-3.4.0/src/mat/interface/matrix.c [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Arguments must have same communicators! [0]PETSC ERROR: Different communicators in the two objects: Argument # 1 and 2 flag 3! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.4.0, May, 13, 2013 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: petscwrap on a arch-linux2-c-debug named dornroeschen.igpm.rwth-aachen.de by fischer Tue Apr 22 11:14:39 2014 [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Arguments must have same communicators! [1]PETSC ERROR: Different communicators in the two objects: Argument # 1 and 2 flag 3! [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Petsc Release Version 3.4.0, May, 13, 2013 [1]PETSC ERROR: See docs/changes/index.html for recent updates. [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [1]PETSC ERROR: See docs/index.html for manual pages. [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: petscwrap on a arch-linux2-c-debug named dornroeschen.igpm.rwth-aachen.de by fischer Tue Apr 22 11:14:39 2014 [1]PETSC ERROR: Libraries linked from /usr/local/openmpi-4.8.1/lib [1]PETSC ERROR: Configure run at Fri Jun 7 09:32:12 2013 [1]PETSC ERROR: Configure options --with-netcdf=1 --prefix=/usr/local/openmpi-4.8.1 --with-mpi=1 --with-shared-libraries=1 --with-hdf=1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-netcdf-lib=/usr/local/openmpi-4.8.1/lib64/libnetcdf.so --with-metis=1 --with-netcdf-include=/usr/local/openmpi-4.8.1/include --with-metis-dir=/usr/local/openmpi-4.8.1 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --libdir=/usr/local/openmpi-4.8.1/lib64 [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: MatView() line 811 in /local/petsc-3.4.0/src/mat/interface/matrix.c [1]PETSC ERROR: main() line 108 in "unknowndirectory/"petsctest.cpp [0]PETSC ERROR: Libraries linked from /usr/local/openmpi-4.8.1/lib [0]PETSC ERROR: Configure run at Fri Jun 7 09:32:12 2013 [0]PETSC ERROR: Configure options --with-netcdf=1 --prefix=/usr/local/openmpi-4.8.1 --with-mpi=1 --with-shared-libraries=1 --with-hdf=1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-netcdf-lib=/usr/local/openmpi-4.8.1/lib64/libnetcdf.so --with-metis=1 --with-netcdf-include=/usr/local/openmpi-4.8.1/include --with-metis-dir=/usr/local/openmpi-4.8.1 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --libdir=/usr/local/openmpi-4.8.1/lib64 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatView() line 811 in /local/petsc-3.4.0/src/mat/interface/matrix.c [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Arguments must have same communicators! [0]PETSC ERROR: Different communicators in the two objects: Argument # 1 and 2 flag 3! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.4.0, May, 13, 2013 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: petscwrap on a arch-linux2-c-debug named dornroeschen.igpm.rwth-aachen.de by fischer Tue Apr 22 11:14:39 2014 [0]PETSC ERROR: Libraries linked from /usr/local/openmpi-4.8.1/lib [0]PETSC ERROR: Configure run at Fri Jun 7 09:32:12 2013 [0]PETSC ERROR: Configure options --with-netcdf=1 --prefix=/usr/local/openmpi-4.8.1 --with-mpi=1 --with-shared-libraries=1 --with-hdf=1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-hdf-dir=/usr/local/openmpi-4.8.1 --with-netcdf-lib=/usr/local/openmpi-4.8.1/lib64/libnetcdf.so --with-metis=1 --with-netcdf-include=/usr/local/openmpi-4.8.1/include --with-metis-dir=/usr/local/openmpi-4.8.1 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --libdir=/usr/local/openmpi-4.8.1/lib64 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatView() line 811 in /local/petsc-3.4.0/src/mat/interface/matrix.c [0]PETSC ERROR: main() line 108 in "unknowndirectory/"petsctest.cpp -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 80. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- PETSc KSPSolve done, residual norm: 30.4324, it took 10000 iterations.PETSc KSPSolve done, residual norm: 30.4324, it took 10000 iterations.-------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 127153 on node dornroeschen.igpm.rwth-aachen.de exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [dornroeschen.igpm.rwth-aachen.de:127152] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics [dornroeschen.igpm.rwth-aachen.de:127152] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [dornroeschen.igpm.rwth-aachen.de:127152] 1 more process has sent help message help-mpi-api.txt / mpi-abort I would be glad to get some pointers as to where this problem comes from. Kind regards, Niklas Fischer
petsctest.tar.gz
Description: application/gzip
