You can use the 'checkpoint to local disk' example to checkpoint and restart without access to a globally shared storage devices. There is an example on the website that does not use a globally mounted file system: http://www.osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-local
What version of Open MPI are you using? This functionality is known to be broken on the v1.3/1.4 branches, per the ticket below: https://svn.open-mpi.org/trac/ompi/ticket/2139 Try the nightly snapshot of the 1.5 branch or the development trunk, and see if this issues still occurs. -- Josh On Feb 8, 2010, at 8:35 AM, Andreea Costea wrote: > I asked this question because checkpointing with to NFS is successful, but > checkpointing without a mount filesystem or a shared storage throws this > warning&error: > > WARNING: Could not preload specified file: File already exists. > Fileset: /home/andreea/checkpoints/global/ompi_global_snapshot_7426.ckpt/0 > Host: X > > Will continue attempting to launch the process. > > > filem:rsh: wait_all(): Wait failed (-1) > [[62871,0],0] ORTE_ERROR_LOG: Error in file snapc_full_global.c at line 1054 > > even if I set the mca-parameters like this: > snapc_base_store_in_place=0 > > crs_base_snapshot_dir > =/home/andreea/checkpoints/local > > snapc_base_global_snapshot_dir > =/home/andreea/checkpoints/global > and the nodes can connect through ssh without a password. > > Thanks, > Andreea > > On Mon, Feb 8, 2010 at 12:59 PM, Andreea Costea <andre.cos...@gmail.com> > wrote: > Hi, > > Let's say I have an MPI application running on several hosts. Is there any > way to checkpoint this application without having a shared storage between > the nodes? > I already took a look at the examples here > http://www.osl.iu.edu/research/ft/ompi-cr/examples.php, but it seems that in > both cases there is a globally mounted file system. > > Thanks, > Andreea > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users