[OMPI users] DMTCP: Checkpoint-Restart solution for OpenMPI
Hi All, As of January 29, 2010, we recently produced a new release (1.1.3) of DMTCP (Distributed MultiThreaded CheckPointing). Its web page is at http://dmtcp.sourceforge.net/ . We (the developers of DMTCP) have tried to carefully test this this version of DMTCP on OpenMPI 1.4.1, and we believe it to be working well. We would welcome feedback from any OpenMPI users who would care to test it on their own applications. The DMTCP package provides an alternative solution for checkpoint-restart of OpenMPI computations. Using it is as simple as: dmtcp_checkpoint dmtcp_checkpoint mpirun ./hello_mpi # Manually checkpoint from any other terminal dmtcp_command --checkpoint # Execute restart script, which invokes ckpt images that were generated. ./dmtcp_restart_script.sh DMTCP works by creating a separate, stateless checkpoint coordinator, independent of OpenMPI's orterun. All OpenMPI processes are then checkpointed, including orterun. At restart time, a new DMTCP checkpoint coordinator can be used. DMTCP is transparent and runs entirely in user space. There is no modification to the MPI application binary, nor to OpenMPI nor to the operating system kernel. DMTCP also supports a dmtcpaware interface (application-initiated checkpoints), and numerous other features. At this time, DMTCP supports only the use of Ethernet (TCP/IP) and shared memory for transport. We are looking at supporting the Infiniband transport layer in the future. Finally, a bit of history. DMTCP began with a goal of checkpointing distributed desktop applications. We recognize thefine checkpoint-restart solution that already exists in OpenMPI: checkpoint-restart service on top of BLCR. We offer DMTCP as an alternative for some unusual situations, such as when the end user does not have privilege to add the BLCR kernel module. We are eager to gain feedback from the OpenMPI community. Thanks, DMTCP Developers
Re: [OMPI users] Test OpenMPI on a cluster
It seems your OpenMPI installation is not PBS-aware. Either reinstall OpenMPI configured for PBS (and then you don't even need -np 10), or, as Constantinos says, find the PBS nodefile and pass that to mpirun. On Sat, 2010-01-30 at 18:45 -0800, Tim wrote: > Hi, > > I am learning MPI on a cluster. Here is one simple example. I expect the > output would show response from different nodes, but they all respond from > the same node node062. I just wonder why and how I can actually get report > from different nodes to show MPI actually distributes processes to different > nodes? Thanks and regards! > > ex1.c > > /* test of MPI */ > #include "mpi.h" > #include > #include > > int main(int argc, char **argv) > { > char idstr[2232]; char buff[22128]; > char processor_name[MPI_MAX_PROCESSOR_NAME]; > int numprocs; int myid; int i; int namelen; > MPI_Status stat; > > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&numprocs); > MPI_Comm_rank(MPI_COMM_WORLD,&myid); > MPI_Get_processor_name(processor_name, &namelen); > > if(myid == 0) > { > printf("WE have %d processors\n", numprocs); > for(i=1;i { > sprintf(buff, "Hello %d", i); > MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); } > for(i=1;i { > MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat); > printf("%s\n", buff); > } > } > else > { > MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat); > sprintf(idstr, " Processor %d at node %s ", myid, processor_name); > strcat(buff, idstr); > strcat(buff, "reporting for duty\n"); > MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD); > } > MPI_Finalize(); > > } > > ex1.pbs > > #!/bin/sh > # > #This is an example script example.sh > # > #These commands set up the Grid Environment for your job: > #PBS -N ex1 > #PBS -l nodes=10:ppn=1,walltime=1:10:00 > #PBS -q dque > > # export OMP_NUM_THREADS=4 > > mpirun -np 10 /home/tim/courses/MPI/examples/ex1 > > compile and run: > > [tim@user1 examples]$ mpicc ./ex1.c -o ex1 > [tim@user1 examples]$ qsub ex1.pbs > 35540.mgt > [tim@user1 examples]$ nano ex1.o35540 > > Begin PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883 > Job ID: 35540.mgt > Username: tim > Group: Brown > Nodes: node062 node063 node169 node170 node171 node172 node174 > node175 > node176 node177 > End PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883 > > WE have 10 processors > Hello 1 Processor 1 at node node062 reporting for duty > > Hello 2 Processor 2 at node node062 reporting for duty > > Hello 3 Processor 3 at node node062 reporting for duty > > Hello 4 Processor 4 at node node062 reporting for duty > > Hello 5 Processor 5 at node node062 reporting for duty > > Hello 6 Processor 6 at node node062 reporting for duty > > Hello 7 Processor 7 at node node062 reporting for duty > > Hello 8 Processor 8 at node node062 reporting for duty > > Hello 9 Processor 9 at node node062 reporting for duty > > > Begin PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891 > Job ID: 35540.mgt > Username: tim > Group: Brown > Job Name: ex1 > Session:15533 > Limits: neednodes=10:ppn=1,nodes=10:ppn=1,walltime=01:10:00 > Resources: cput=00:00:00,mem=420kb,vmem=8216kb,walltime=00:00:03 > Queue: dque > Account: > Nodes: node062 node063 node169 node170 node171 node172 node174 node175 > node176 > node177 > Killing leftovers... > > End PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891 > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Test OpenMPI on a cluster
Tim wrote: Hi, I am learning MPI on a cluster. Here is one simple example. I expect the output would show response from different nodes, but they all respond from the same node node062. I just wonder why and how I can actually get report from different nodes to show MPI actually distributes processes to different nodes? Thanks and regards! ex1.c /* test of MPI */ #include "mpi.h" #include #include int main(int argc, char **argv) { char idstr[2232]; char buff[22128]; char processor_name[MPI_MAX_PROCESSOR_NAME]; int numprocs; int myid; int i; int namelen; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name, &namelen); if(myid == 0) { printf("WE have %d processors\n", numprocs); for(i=1;i { sprintf(buff, "Hello %d", i); MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); } for(i=1;i{ MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat); printf("%s\n", buff); } } else { MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat); sprintf(idstr, " Processor %d at node %s ", myid, processor_name); strcat(buff, idstr); strcat(buff, "reporting for duty\n"); MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD); } MPI_Finalize(); } ex1.pbs #!/bin/sh # #This is an example script example.sh # #These commands set up the Grid Environment for your job: #PBS -N ex1 #PBS -l nodes=10:ppn=1,walltime=1:10:00 #PBS -q dque # export OMP_NUM_THREADS=4 mpirun -np 10 /home/tim/courses/MPI/examples/ex1 Try running your program with the following: mpirun -np 10 -machinefile machines /home/tim/courses/MPI/examples/ex1 where 'machines' is a file containing the names of your nodes (one per line) node063 node064 ... node177 HTH, -- Constantinos Makassikis compile and run: [tim@user1 examples]$ mpicc ./ex1.c -o ex1 [tim@user1 examples]$ qsub ex1.pbs 35540.mgt [tim@user1 examples]$ nano ex1.o35540 Begin PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883 Job ID: 35540.mgt Username: tim Group: Brown Nodes: node062 node063 node169 node170 node171 node172 node174 node175 node176 node177 End PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883 WE have 10 processors Hello 1 Processor 1 at node node062 reporting for duty Hello 2 Processor 2 at node node062 reporting for duty Hello 3 Processor 3 at node node062 reporting for duty Hello 4 Processor 4 at node node062 reporting for duty Hello 5 Processor 5 at node node062 reporting for duty Hello 6 Processor 6 at node node062 reporting for duty Hello 7 Processor 7 at node node062 reporting for duty Hello 8 Processor 8 at node node062 reporting for duty Hello 9 Processor 9 at node node062 reporting for duty Begin PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891 Job ID: 35540.mgt Username: tim Group: Brown Job Name: ex1 Session:15533 Limits: neednodes=10:ppn=1,nodes=10:ppn=1,walltime=01:10:00 Resources: cput=00:00:00,mem=420kb,vmem=8216kb,walltime=00:00:03 Queue: dque Account: Nodes: node062 node063 node169 node170 node171 node172 node174 node175 node176 node177 Killing leftovers... End PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Create group in a non-collective way
Hi, In my code I need to specify for some processes to create a group. Now, in general the way of doing that is (correct me if I'm wrong): int ranks[] = { 1,2,3 }; int rank; MPI_Group world_group = MPI_GROUP_NULL; MPI_Group subgroup = MPI_GROUP_NULL; MPI_Comm subcomm = MPI_COMM_NULL; MPI_Comm_rank(MPI_COMM_WORLD, &rank); // local operation MPI_Comm_group(MPI_COMM_WORLD, &world_group); // local operation MPI_Group_incl(world_group, 3, ranks, &subgroup); // local operation MPI_Comm_create(MPI_COMM_WORLD, subgroup, &subcomm); // collective operation on MPI_COMM_WORLD if (rank>0 rank<4) { // do something with subcomm } // cleanup Is there any way to create the communicator inside the if? Thanks