On Sat, Aug 6, 2011 at 4:12 AM, Barry Smith <bsmith at mcs.anl.gov> wrote: > > ?Does the PETSc example src/vec/vec/examples/tutorials/ex1.c run correctly on > 8+ processes?
yes, as per: dsz at nexo:~/pack/petsc-3.1-p8/src/vec/vec/examples/tutorials$ ~/pack/petsc-3.1-p8/externalpackages/mpich2-1.0.8/bin/mpiexec -np 12 ./ex1 Vector length 20 Vector length 20 40 60 All other values should be near zero VecScale 0 VecCopy 0 VecAXPY 0 VecAYPX 0 VecSwap 0 VecSwap 0 VecWAXPY 0 VecPointwiseMult 0 VecPointwiseDivide 0 VecMAXPY 0 0 0 > ?Are you sure the MPI shared libraries are the same on both systems? I was not precise, I have only one system consisting of two 6core Intels. 12 cores in total. I have openmpi installed alongside, but was explicitly calling mpiexec from petsc external packages. > ? You can try the option -on_error_attach_debugger When run with np 12 It only opens 6 windows, saying: [9]PETSC ERROR: MPI error 14 [1]PETSC ERROR: MPI error 14 [7]PETSC ERROR: MPI error 14 [9]PETSC ERROR: PETSC: Attaching gdb to /home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11798 on display localhost:11.0 on machine nexo [1]PETSC ERROR: PETSC: Attaching gdb to /home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11790 on display localhost:11.0 on machine nexo [7]PETSC ERROR: PETSC: Attaching gdb to /home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11796 on display localhost:11.0 on machine nexo [9]PETSC ERROR: PetscGatherNumberOfMessages() line 62 in src/sys/utils/mpimesg.c [1]PETSC ERROR: PetscGatherNumberOfMessages() line 62 in src/sys/utils/mpimesg.c [7]PETSC ERROR: PetscGatherNumberOfMessages() line 62 in src/sys/utils/mpimesg.c [1]PETSC ERROR: PETSC: Attaching gdb to /home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11790 on display localhost:11.0 on machine nexo [9]PETSC ERROR: PETSC: Attaching gdb to /home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11798 on display localhost:11.0 on machine nexo [7]PETSC ERROR: PETSC: Attaching gdb to /home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11796 on display localhost:11.0 on machine nexo When now starting the program in the 6 windows with its expected args results in: [cli_9]: PMIU_parse_keyvals: unexpected key delimiter at character 54 in cmd [cli_9]: parse_kevals failed -1 I will not be able to do proper valgrinding/puryfying before next week. In the meantime I will still appreciate any hints. Regards, Dominik > > > ?Barry > > On Aug 5, 2011, at 4:41 PM, Dominik Szczerba wrote: > >> I have a 2x6core. My solver works fine only on up to 8 processes, >> above that it always crashes with the below cited error. I did not yet >> valgrind etc. because I am in a desperate need to fix it quickly. I am >> just wondering what can potentially be the culprit. >> >> PS. I am not using MPI_Allreduce anywhere in my code. >> >> Many thanks for any hints, >> Dominik >> >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_9]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_1]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_7]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> INTERNAL ERROR: Invalid error class (66) encountered while returning from >> MPI_Allreduce. ?Please file a bug report. ?No error stack is available. >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_11]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 3 Quit: Some other process (or >> the batch system) has told this process to end >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC >> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to >> find memory corruption errors >> [0]PETSC ERROR: likely location of problem given in stack below >> [0]PETSC ERROR: --------------------- ?Stack Frames >> ------------------------------------ >> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [0]PETSC ERROR: ? ? ? INSTEAD the line number of the start of the function >> [0]PETSC ERROR: ? ? ? is given. >> [0]PETSC ERROR: [0] MatAssemblyBegin_MPIAIJ line 462 >> src/mat/impls/aij/mpi/mpiaij.c >> [0]PETSC ERROR: [0] MatAssemblyBegin line 4553 src/mat/interface/matrix.c >> [0]PETSC ERROR: [0] User provided functi[2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: Caught signal number 3 Quit: Some other process (or >> the batch system) has told this process to end >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [2]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC >> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to >> find memory corruption errors >> [2]PETSC ERROR: likely location of problem given in stack below >> [2]PETSC ERROR: --------------------- ?Stack Frames >> ------------------------------------ >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [2]PETSC ERROR: ? ? ? INSTEAD the line number of the start of the function >> [2]PETSC ERROR: ? ? ? is given. >> [2]PETSC ERROR: [2] VecAssemblyBegin line 157 src/vec/vec/interface/vector.c >> [2]PETSC ERROR: [2] User provided function line 160 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [INTERNAL ERROR: Invalid error class (66) encountered while returning from >> MPI_Allreduce. ?Please file a bug report. ?No error stack is available. >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_3]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> [4]PETSC ERROR: >> ------------------------------------------------------------------------ >> [4]PETSC ERROR: Caught signal number 3 Quit: Some other process (or >> the batch system) has told this process to end >> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [4]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC >> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to >> find memory corruption errors >> [4]PETSC ERROR: likely location of problem given in stack below >> [4]PETSC ERROR: --------------------- ?Stack Frames >> ------------------------------------ >> [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [4]PETSC ERROR: ? ? ? INSTEAD the line number of the start of the function >> [4]PETSC ERROR: ? ? ? is given. >> [4]PETSC ERROR: [4] MatAssemblyBegin_MPIAIJ line 462 >> src/mat/impls/aij/mpi/mpiaij.c >> [4]PETSC ERROR: [4] MatAssemblyBegin line 4553 src/mat/interface/matrix.c >> [4]PETSC ERROR: [4] User provided functiINTERNAL ERROR: Invalid error >> class (66) encountered while returning from >> MPI_Allreduce. ?Please file a bug report. ?No error stack is available. >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_5]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> [6]PETSC ERROR: >> ------------------------------------------------------------------------ >> [6]PETSC ERROR: Caught signal number 3 Quit: Some other process (or >> the batch system) has told this process to end >> [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [6]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[6]PETSC >> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to >> find memory corruption errors >> [6]PETSC ERROR: likely location of problem given in stack below >> [6]PETSC ERROR: --------------------- ?Stack Frames >> ------------------------------------ >> [6]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [6]PETSC ERROR: ? ? ? INSTEAD the line number of the start of the function >> [6]PETSC ERROR: ? ? ? is given. >> [6]PETSC ERROR: [6] MatAssemblyBegin_MPIAIJ line 462 >> src/mat/impls/aij/mpi/mpiaij.c >> [6]PETSC ERROR: [6] MatAssemblyBegin line 4553 src/mat/interface/matrix.c >> [6]PETSC ERROR: [6] User provided functiINTERNAL ERROR: Invalid error >> class (66) encountered while returning from >> MPI_Allreduce. ?Please file a bug report. ?No error stack is available. >> Fatal error in MPI_Allreduce: Error message texts are not >> available[cli_8]: aborting job: >> Fatal error in MPI_Allreduce: Error message texts are not available >> [10]PETSC ERROR: >> ------------------------------------------------------------------------ >> [10]PETSC ERROR: Caught signal number 3 Quit: Some other process (or >> the batch system) has told this process to end >> [10]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [10]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[10]PETSC >> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to >> find memory corruption errors >> [10]PETSC ERROR: likely location of problem given in stack below >> [10]PETSC ERROR: --------------------- ?Stack Frames >> ------------------------------------ >> [10]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [10]PETSC ERROR: ? ? ? INSTEAD the line number of the start of the function >> [10]PETSC ERROR: ? ? ? is given. >> [10]PETSC ERROR: [10] VecAssemblyBegin line 157 >> src/vec/vec/interface/vector.c >> [10]PETSC ERROR: [10] User provided function line 160 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/on line 294 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [0]PETSC ERROR: [0] User provided function line 627 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Signal received! >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [0]PETSC ERROR: See docs/index.html for manual pages. >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug >> 6 00:35:58 2011 >> [0]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [0]PETSC ERROR: Configure run at Sat Aug ?6 00:02:58 2011 >> [0]PETSC ERROR: Config2]PETSC ERROR: [2] User provided function line >> 294 "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [2]PETSC ERROR: [2] User provided function line 627 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [2]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [2]PETSC ERROR: Signal received! >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [2]PETSC ERROR: See docs/changes/index.html for recent updates. >> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [2]PETSC ERROR: See docs/index.html for manual pages. >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug >> 6 00:35:58 2011 >> [2]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [2]PETSC ERROR: Configure run at Sat Aug on line 294 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [4]PETSC ERROR: [4] User provided function line 627 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [4]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [4]PETSC ERROR: Signal received! >> [4]PETSC ERROR: >> ------------------------------------------------------------------------ >> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [4]PETSC ERROR: See docs/changes/index.html for recent updates. >> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [4]PETSC ERROR: See docs/index.html for manual pages. >> [4]PETSC ERROR: >> ------------------------------------------------------------------------ >> [4]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug >> 6 00:35:58 2011 >> [4]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [4]PETSC ERROR: Configure run at Sat Aug ?6 00:02:58 2011 >> [4]PETSC ERROR: Configon line 294 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [6]PETSC ERROR: [6] User provided function line 627 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [6]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [6]PETSC ERROR: Signal received! >> [6]PETSC ERROR: >> ------------------------------------------------------------------------ >> [6]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [6]PETSC ERROR: See docs/changes/index.html for recent updates. >> [6]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [6]PETSC ERROR: See docs/index.html for manual pages. >> [6]PETSC ERROR: >> ------------------------------------------------------------------------ >> [6]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug >> 6 00:35:58 2011 >> [6]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [6]PETSC ERROR: Configure run at Sat Aug ?6 00:02:58 2011 >> [6]PETSC ERROR: ConfigSM3T4mpi.cxx >> [10]PETSC ERROR: [10] User provided function line 294 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [10]PETSC ERROR: [10] User provided function line 627 >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx >> [10]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [10]PETSC ERROR: Signal received! >> [10]PETSC ERROR: >> ------------------------------------------------------------------------ >> [10]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [10]PETSC ERROR: See docs/changes/index.html for recent updates. >> [10]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [10]PETSC ERROR: See docs/index.html for manual pages. >> [10]PETSC ERROR: >> ------------------------------------------------------------------------ >> [10]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug >> 6 00:35:58 2011 >> [10]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [10]PETSC ERRure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 >> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1 >> --download-mpich=1 --download-hypre=1 --with-parmetis=1 >> --download-parmetis=1 --with-x=0 --with-debugging=1 >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: User provided function() line 0 in unknown directory >> unknown file >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0[cli_0]: >> aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> 6 00:02:58 2011 >> [2]PETSC ERROR: Configure options >> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1 >> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1 >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: User provided function() line 0 in unknown directory >> unknown file >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 2[cli_2]: >> aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 2 >> ure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 >> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1 >> --download-mpich=1 --download-hypre=1 --with-parmetis=1 >> --download-parmetis=1 --with-x=0 --with-debugging=1 >> [4]PETSC ERROR: >> ------------------------------------------------------------------------ >> [4]PETSC ERROR: User provided function() line 0 in unknown directory >> unknown file >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 4[cli_4]: >> aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 4 >> ure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 >> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1 >> --download-mpich=1 --download-hypre=1 --with-parmetis=1 >> --download-parmetis=1 --with-x=0 --with-debugging=1 >> [6]PETSC ERROR: >> ------------------------------------------------------------------------ >> [6]PETSC ERROR: User provided function() line 0 in unknown directory >> unknown file >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6[cli_6]: >> aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6 >> OR: Configure run at Sat Aug ?6 00:02:58 2011 >> [10]PETSC ERROR: Configure options >> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1 >> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1 >> [10]PETSC ERROR: >> ------------------------------------------------------------------------ >> [10]PETSC ERROR: User provided function() line 0 in unknown directory >> unknown file >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10[cli_10]: >> aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10 > >
