[OMPI users] Signal: Segmentation fault (11) Signal code: Address not mapped (1)
Dear All, I have installed openmpi 1.3.2 in my home directory ( /home/jean/openmpisof/ ) and BLCR in /usr/local/blcr. I have added the following in the .bashrc file export PATH=/home/jean/openmpisof/bin/:$PATH export LD_LIBRARY_PATH=/home/jean/openmpisof/lib/:$LD_LIBRARY_PATH export PATH=/usr/local/blcr/bin/:$PATH export LD_LIBRARY_PATH=/usr/local/blcr/lib:$LD_LIBRARY_PATH I am running my application as follows: mpirun -am ft-enable-cr -mca btl ^openib -mca snapc_base_global_snapshot_dir /tmp mpitest But I get the following error when i try to checkpoint the application. ## [sun06:20513] *** Process received signal *** [sun06:20513] Signal: Segmentation fault (11) [sun06:20513] Signal code: Address not mapped (1) [sun06:20513] Failing at address: 0x4 [sun06:20513] [ 0] [0xb7fab40c] [sun06:20513] [ 1] /lib/libc.so.6(cfree+0x3b) [0xb79e468b] [sun06:20513] [ 2] /usr/local/blcr/lib/libcr.so.0(cri_info_free+0x2a) [0xb7b1725a] [sun06:20513] [ 3] /usr/local/blcr/lib/libcr.so.0 [0xb7b18c72] [sun06:20513] [ 4] /lib/libc.so.6(__libc_fork+0x186) [0xb7a0d266] [sun06:20513] [ 5] /lib/libpthread.so.0(fork+0x14) [0xb7ac4b24] [sun06:20513] [ 6] /home/jean/openmpisof/lib/libopen-pal.so.0 [0xb7bc2a01] [sun06:20513] [ 7] /home/jean/openmpisof/lib/libopen-pal.so.0(opal_crs_blcr_checkpoint+0x187) [0xb7bc231b] [sun06:20513] [ 8] /home/jean/openmpisof/lib/libopen-pal.so.0(opal_cr_inc_core+0xc3) [0xb7b8eb1d] [sun06:20513] [ 9] /home/jean/openmpisof/lib/libopen-rte.so.0 [0xb7cab40f] [sun06:20513] [10] /home/jean/openmpisof/lib/libopen-pal.so.0(opal_cr_test_if_checkpoint_ready+0x129) [0xb7b8ea2a] [sun06:20513] [11] /home/jean/openmpisof/lib/libopen-pal.so.0 [0xb7b8f0f8] [sun06:20513] [12] /lib/libpthread.so.0 [0xb7abbf3b] [sun06:20513] [13] /lib/libc.so.6(clone+0x5e) [0xb7a42bee] [sun06:20513] *** End of error message *** ### Any help will be very appreciated. Regards, Jean
Re: [OMPI users] Messages getting lost during transmission (?)
Dennis In MPI, you must complete every MPI_Isend by MPI_Wait on the request handle (or a variant like MPI_Waitall or MPI_Test that returns TRUE). An un-completed MPI_Isend leaves resources tied up. I do not know what symptom to expect from OpenMPI with this particular application error but the one you describe is plausible. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/09/2009 11:47:12 AM: > [image removed] > > [OMPI users] Messages getting lost during transmission (?) > > Dennis Luxen > > to: > > users > > 09/09/2009 11:48 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Hi all, > > I have a very strange behaviour in a program. It seems that messages > that are sent from one processor to another are getting lost. > > The problem is isolated in the attached source code. The code works as > follows. Two processess send each other 100k request. Each request is > answered and triggers a number of requests to the other process in > return. As you might already suspect, the communication is asynchronous. > > I already debugged the application and found that at one point during > the communication at least one of the processes does not receive any > messages anymore and hangs in the while loop beginning in line 45. > > The program is started with two processes on a single machine and no > other parameters: "mpirun -np 2 ./mpi_test2". > > I appreciate your help. > > Best wishes, > Dennis > > -- > Dennis Luxen > Universität Karlsruhe (TH) | Fon : +49 (721) 608-6781 > Institut für Theoretische Informatik | Fax : +49 (721) 608-3088 > Am Fasanengarten 5, Zimmer 220 | WWW : algo2.ira.uka.de/luxen > D-76131 Karlsruhe, Germany | Email: lu...@kit.edu > > > #include > #include > #include > #include > #include > #include > #include > #include > > std::ofstream output_file; > > enum {REQUEST_TAG=4321, ANSWER_TAG, FINISHED_TAG}; > > typedef int Answer_type; > > > int main(int argc, char *argv[]) > { >MPI_Init (&argc, &argv); // starts MPI >int number_of_PEs, my_PE_ID; >MPI_Comm_size(MPI_COMM_WORLD, &number_of_PEs); >assert(number_of_PEs == 2); >MPI_Comm_rank(MPI_COMM_WORLD, &my_PE_ID); > >std::srand(123456); > >int number_of_requests_to_send = 10; >int number_of_requests_to_recv = number_of_requests_to_send; >int number_of_answers_to_recv = number_of_requests_to_send; > >std::stringstream filename; >filename<<"output">int buffer[100]; >MPI_Request dummy_request; > >//Send the first request >MPI_Isend(buffer, 1, MPI_INT, 1-my_PE_ID, REQUEST_TAG, > MPI_COMM_WORLD, &dummy_request); >number_of_requests_to_send--; > >int working_PEs = number_of_PEs; >bool lack_of_work_sent = false; >bool there_was_change = true; >while(working_PEs > 0) >{ > if(there_was_change) > { > there_was_change = false; > std::cout >return 0; > } > Package: Open MPI abuild@build26 Distribution > Open MPI: 1.3.2 >Open MPI SVN revision: r21054 >Open MPI release date: Apr 21, 2009 > Open RTE: 1.3.2 >Open RTE SVN revision: r21054 >Open RTE release date: Apr 21, 2009 > OPAL: 1.3.2 >OPAL SVN revision: r21054 >OPAL release date: Apr 21, 2009 > Ident string: 1.3.2 > Prefix: /usr/lib64/mpi/gcc/openmpi > Configured architecture: x86_64-suse-linux-gnu > Configure host: build26 >Configured by: abuild >Configured on: Tue May 5 16:03:55 UTC 2009 > Configure host: build26 > Built by: abuild > Built on: Tue May 5 16:18:52 UTC 2009 > Built host: build26 > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ >C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: gfortran > Fortran77 compiler abs: /usr/bin/gfortran > Fortran90 compiler: gfortran > Fortran90 compiler abs: /usr/bin/gfortran > C profiling: yes >C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) >Sparse Groups: no > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes >Heterogeneous support
[OMPI users] Messages getting lost during transmission (?)
Hi all, I have a very strange behaviour in a program. It seems that messages that are sent from one processor to another are getting lost. The problem is isolated in the attached source code. The code works as follows. Two processess send each other 100k request. Each request is answered and triggers a number of requests to the other process in return. As you might already suspect, the communication is asynchronous. I already debugged the application and found that at one point during the communication at least one of the processes does not receive any messages anymore and hangs in the while loop beginning in line 45. The program is started with two processes on a single machine and no other parameters: "mpirun -np 2 ./mpi_test2". I appreciate your help. Best wishes, Dennis -- Dennis Luxen Universität Karlsruhe (TH) | Fon : +49 (721) 608-6781 Institut für Theoretische Informatik | Fax : +49 (721) 608-3088 Am Fasanengarten 5, Zimmer 220 | WWW : algo2.ira.uka.de/luxen D-76131 Karlsruhe, Germany | Email: lu...@kit.edu #include #include #include #include #include #include #include #include std::ofstream output_file; enum {REQUEST_TAG=4321, ANSWER_TAG, FINISHED_TAG}; typedef int Answer_type; int main(int argc, char *argv[]) { MPI_Init (&argc, &argv); // starts MPI int number_of_PEs, my_PE_ID; MPI_Comm_size(MPI_COMM_WORLD, &number_of_PEs); assert(number_of_PEs == 2); MPI_Comm_rank(MPI_COMM_WORLD, &my_PE_ID); std::srand(123456); int number_of_requests_to_send = 10; int number_of_requests_to_recv = number_of_requests_to_send; int number_of_answers_to_recv = number_of_requests_to_send; std::stringstream filename; filename<<"output"< 0) { if(there_was_change) { there_was_change = false; std::cout< Package: Open MPI abuild@build26 Distribution Open MPI: 1.3.2 Open MPI SVN revision: r21054 Open MPI release date: Apr 21, 2009 Open RTE: 1.3.2 Open RTE SVN revision: r21054 Open RTE release date: Apr 21, 2009 OPAL: 1.3.2 OPAL SVN revision: r21054 OPAL release date: Apr 21, 2009 Ident string: 1.3.2 Prefix: /usr/lib64/mpi/gcc/openmpi Configured architecture: x86_64-suse-linux-gnu Configure host: build26 Configured by: abuild Configured on: Tue May 5 16:03:55 UTC 2009 Configure host: build26 Built by: abuild Built on: Tue May 5 16:18:52 UTC 2009 Built host: build26 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Sparse Groups: no Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.2) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.2) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.2) MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.2) MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: sm (MCA v2.0, API v2.0, Component v1
Re: [OMPI users] [OMPI devel] Error message improvement
__func__ is what you should use. We take care of having it defined in _all_ cases. If the compiler doesn't support it we define it manually (to __FUNCTION__ or to __FILE__ in the worst case), so it is always available (even if it doesn't contain what one might expect such in the case of __FILE__). george. On Sep 9, 2009, at 14:16 , Lenny Verkhovsky wrote: Hi All, does C99 complient compiler is something unusual or is there a policy among OMPI developers/users that prevent me f rom using __func__ instead of hardcoded strings in the code ? Thanks. Lenny. On Wed, Sep 9, 2009 at 1:48 PM, Nysal Jan wrote: __FUNCTION__ is not portable. __func__ is but it needs a C99 compliant compiler. --Nysal On Tue, Sep 8, 2009 at 9:06 PM, Lenny Verkhovsky > wrote: fixed in r21952 thanks. On Tue, Sep 8, 2009 at 5:08 PM, Arthur Huillet > wrote: Lenny Verkhovsky wrote: Why not using __FUNCTION__ in all our error messages ??? Sounds good, this way the function names are always correct. -- Greetings, A. Huillet ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI users] [OMPI devel] Error message improvement
Hi All, does C99 complient compiler is something unusual or is there a policy among OMPI developers/users that prevent me f rom using __func__ instead of hardcoded strings in the code ? Thanks. Lenny. On Wed, Sep 9, 2009 at 1:48 PM, Nysal Jan wrote: > __FUNCTION__ is not portable. > __func__ is but it needs a C99 compliant compiler. > > --Nysal > > On Tue, Sep 8, 2009 at 9:06 PM, Lenny Verkhovsky < > lenny.verkhov...@gmail.com> wrote: > >> fixed in r21952 >> thanks. >> >> On Tue, Sep 8, 2009 at 5:08 PM, Arthur Huillet >> wrote: >> >>> Lenny Verkhovsky wrote: >>> Why not using __FUNCTION__ in all our error messages ??? >>> >>> Sounds good, this way the function names are always correct. >>> >>> -- >>> Greetings, A. Huillet >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI users] SVD with mpi
Attila Börcs wrote: Hi Everyone, I'd like to achieve singular value decomposition with mpi. I heard about Lanczos algorith and some different kind of algorith for svd, but I need some help about this theme. Knows anybody some usable code or tutorial about parallel svd? Best Regards, Attila If you need a full decomposition, scalapack is the best. Otherwise, you may take a look at SLEPc (which use the PETSc framework) Yann
Re: [OMPI users] SVD with mpi
Take a look at http://www.netlib.org/scalapack/ Ciao Terry On Tue, 2009-09-08 at 13:55 +0200, Attila Börcs wrote: > Hi Everyone, > > I'd like to achieve singular value decomposition with mpi. I heard > about Lanczos algorith and some different kind of algorith for svd, > but I need some help about this theme. Knows anybody some usable code > or tutorial about parallel svd? > > Best Regards, > > Attila > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users