On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

Hi Everyone,
I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process.

Error: expected_component: PID information unavailable!
Error: expected_component: Component Name information unavailable!

I am using  open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?


I execute my application as follows:

mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.

My question:

Does openmpi with blcr support checkpointing of multi node execution of mpi application? If so, can you provide me with some information on how to achieve this.

Open MPI is able to checkpoint a multi-node application (that's what it was designed to do). There are some examples at the link below:
  http://www.osl.iu.edu/research/ft/ompi-cr/examples.php

-- Josh


Cheers,

Jean.

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to