Our c/r support is unfortunately deprecated due to loss of the person who wrote and supported it. So I'm afraid we are unable to really help with it, and c/r support will not be included in future releases unless someone becomes available to support it again.
On Jan 13, 2013, at 4:37 AM, Jerry Mersel <jerry.mer...@weizmann.ac.il> wrote: > > checkpointing and restarting openmpi applications don't work for me. > > I have a redhat version 5U6 system with blcr checkpointing version 0.8.4 > and openmpi version 1.6.3. > > I have a simple parallel application that I want to checkpoint and restart. > > I see that the blcr modules are loaded (with lsmod). > > I run: > mpirun -np 1 -hostfile hostfile -am ft-enable-cr EXECUTABLE > ompi-checkpoint -v -s <PID of mpirun> > > then I kill mpirun. > > then: > ompi-restart -v ompi_global_snapshot_<PID>.ckpt > > here is my results: > > Error: Unable to obtain the proper restart command to restart from the > checkpoint file (opal_snapshot_0.ckpt). Returned -1. > Check the installation of the none checkpoint/restart service > on all of the machines in your system. > > > > If I try using the blcr utilities (cr_run, cr_checkpoint, cr_run) then it > runs on the local machine, it won't on more then one machine. > > Please help me with this. > > Thank you. > > > > > > With Blessings, always, > > Jerry Mersel > > System Administrator > IT Infrastructure Branch | Division of Information Systems > Weizmann Institute of Science > Rehovot 76100, Israel > > Tel: +972-8-9342363 > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users