Our c/r support is unfortunately deprecated due to loss of the person who wrote 
and supported it. So I'm afraid we are unable to really help with it, and c/r 
support will not be included in future releases unless someone becomes 
available to support it again.


On Jan 13, 2013, at 4:37 AM, Jerry Mersel <jerry.mer...@weizmann.ac.il> wrote:

> 
> checkpointing and restarting openmpi applications don't work for me.
> 
> I have a redhat version 5U6 system with blcr checkpointing version 0.8.4
> and openmpi version 1.6.3.
> 
> I have a simple parallel application that I want to checkpoint and restart.
> 
> I see that the blcr modules are loaded (with lsmod).
> 
> I run:
> mpirun  -np 1 -hostfile hostfile -am ft-enable-cr  EXECUTABLE
> ompi-checkpoint -v -s <PID of mpirun>
> 
> then I kill mpirun.
> 
> then:
> ompi-restart -v ompi_global_snapshot_<PID>.ckpt
> 
> here is my results:
> 
> Error: Unable to obtain the proper restart command to restart from the 
>        checkpoint file (opal_snapshot_0.ckpt). Returned -1.
>        Check the installation of the none checkpoint/restart service
>        on all of the machines in your system.
> 
> 
> 
> If I try using the blcr utilities (cr_run, cr_checkpoint, cr_run) then it 
> runs on the local machine,  it won't on more then one machine.
> 
> Please help me with this.
> 
> Thank you.
> 
> 
> 
> 
> 
> With Blessings, always,
> 
>    Jerry Mersel
> 
>    System Administrator
>    IT Infrastructure Branch | Division of Information Systems
>     Weizmann Institute of Science
>     Rehovot 76100, Israel
>   
>    Tel:  +972-8-9342363
>    
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to