I have configured with the additional flags(--enable-ft-thread
--enable-mpi-threads) but there is no change in behaviour, it still
gives seg fault.
open mpi version:
Open MPI: 1.3a1r19685

blcr version:
version 0.7.3


The core file is attached.
hello.c is sample mpi program whose core is dumped is also attached.

~]$ ompi-restart ompi_global_snapshot_11219.ckpt
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 11288 on node
acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
fault).
--------------------------------------------------------------------------
2 total processes killed (some possibly by mpirun during cleanup)


Best,


On Mon, Oct 6, 2008 at 6:44 PM, Josh Hursey <jjhur...@open-mpi.org> wrote:
> The installation looks ok, though I'm not sure what is causing the segfault
> of the restarted process. Two things to try. First can you send me a
> backtrace from the core file that is generated from the segmentation fault.
> That will provide insight into what is causing it.
>
> Second you may try to enable the C/R thread which allows for a checkpoint to
> progress when an application is in a computation loop instead of only when
> it is in the MPI library. To do so configure with these additional flags:
>  --enable-ft-thread --enable-mpi-threads
>
> What version of Open MPI are you using? What version of BLCR?
>
> Best,
> Josh
>
> On Oct 6, 2008, at 3:55 PM, arun dhakne wrote:
>
>> Hi all,
>>
>> This is the procedure i have followed to install openmpi. Is there
>> some installation or environment setting problem in here?
>> an openmpi program with 4 process is run across 2 dual-core intel
>> machines, with 2 processes running on each of the machine.
>>
>> ompi-checkpoint is successful but ompi-restart fails with following error
>>
>>
>> $:> ompi-restart ompi_global_snapshot_6045.ckpt
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 6372 on node
>> acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
>> fault).
>> --------------------------------------------------------------------------
>>
>> Open-mpi installation steps:
>> ./configure --prefix=/home/csgrad/audhakne/.openmpi --with-ft=cr
>> --with-blcr=/usr/lib64 --enable-debug
>> make
>> make install
>>
>>
>>
>> export
>> LD_LIBRARY_PATH=$HOME/.openmpi/lib/:$HOME/.openmpi/lib/openmpi:/usr/lib64
>> export PATH=$HOME/.openmpi/bin:$PATH
>>
>> NOTE: blcr is installed as a module
>> $:> lsmod | grep blcr
>>
>> blcr                  117892  0
>> blcr_vmadump           58264  1 blcr
>> blcr_imports           46080  2 blcr,blcr_vmadump
>>
>> Please let me know if there is problem with above procedure, thanks a
>> lot for your time.
>>
>> Best.
>>
>> ---------- Forwarded message ----------
>> From: arun dhakne <arundha...@gmail.com>
>> Date: Tue, Sep 30, 2008 at 12:52 AM
>> Subject: ompi-restart issue : ompi-restart doesn't work across nodes
>> To: Open MPI Users <us...@open-mpi.org>
>>
>>
>> Hi all,
>>
>> I had gone through some previous ompi-restart issues but i couldn't
>> find anything similar to this problem.
>>
>> I have installed blcr, and configured open-mpi 'openmpi-1.3a1r19645'
>>
>> i) If the sample mpi program say ( np 4 on single machine that is
>> without any hostfile )is ran and I try to checkpoint it, it happens
>> successfully and even ompi-restart works in this case.
>>
>> ii) If the sample mpi program is ran across say 2 different nodes and
>> checkpoint happens successfully BUT ompi-restart throws following
>> error:
>>
>> $ ompi-restart ompi_global_snapshot_7604.ckpt
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 3 with PID 9590 on node
>> acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
>> fault).
>> --------------------------------------------------------------------------
>>
>> Please let me know if more information is needed.
>>
>> --
>> Thanks and Regards,
>> Arun U. Dhakne
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Thanks and Regards,
Arun U. Dhakne
Graduate Student
Computer Science and Engineering Dept.
State University of New York at Buffalo

Attachment: core.tar.gz
Description: GNU Zip compressed data

#include <stdio.h>
#include <mpi.h>
int main (int argc, char *argv[])
{
 int rank, size;
 int i;
 int send, recv;
 MPI_Init (&argc, &argv);      /* starts MPI */
 MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process
id */
 MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of
processes */


 printf( "Hello world from process %d of %d\n", rank, size );
 for (i=0; i < 100; i++){
       send = i;
       if (rank==0){
       MPI_Send(&send, 1, MPI_INT,1, 1, MPI_COMM_WORLD);

       }
       else{
       MPI_Recv(&recv, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD,
NULL);
       printf("Process %d says %d\n", rank, recv);
       }



      printf("Process %d says %d\n", rank, i);

             sleep(3);

              }



               MPI_Finalize();
                return 0;
                }

Reply via email to