Hi Ralph. I checked the errors. I do not understand what the fololowing means : The session directory location could not be parsed. ompi-checkpoint attempted to use the session directory: /tmp/openmpi-sessions-ndesai@vcainternmpi01_0 I opened the /tmp/openmpi-sessions-ndesai directory and various directories are created.
Also, when I run the mpi program, I get the following errors before the program starts running correctly: [ndesai@vcainternmpi01 work]$ mpirun -am ft-enable-cr --np 16 ./DecoderTest ../../decoder/test.ini [vcainternmpi01:25341] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25342] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25343] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25344] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25347] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25354] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25356] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25337] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25338] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25339] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25340] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25355] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25359] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25357] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25358] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) [vcainternmpi01:25362] mca: base: component_find: unable to open /home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared object file: No such file or directory (ignored) I also checked the mca-params-conf file and all it contained were comments. Do I have to make any changes there for getting correct snapshots? Thanks a lot, Neel. On Fri, May 31, 2013 at 5:24 PM, Ralph Castain <r...@open-mpi.org> wrote: > Did you check the items on the list given in the error? I'm no expert on > ompi-checkpoint, but the error means that one of those conditions isn't > being met. > > > On May 31, 2013, at 4:54 PM, Neel Sunil Desai <neel.de...@colorado.edu> > wrote: > > Hi Ralph, > > Thanks for the help. The path and ld_path were not set to the correct > location. I was able to execute the ompi-checkpoint command. But, I got the > following error. > > [ndesai@vcainternmpi01 ~]$ ompi-checkpoint 1803 > -------------------------------------------------------------------------- > Error: Unable to find the requested, active MPIRUN process on this machine. > This could be due to one of the following: > - The jobid specified by the '--hnp-jobid' option is not > correct. > - The PID specified (1803) is not that of an active MPIRUN. > - The application with this PID is not checkpointable > - The application with this PID is not an Open MPI application. > - The session directory location could not be parsed. > ompi-checkpoint attempted to use the session directory: > /tmp/openmpi-sessions-ndesai@vcainternmpi01_0 > Thanks, > Neel. > > On Fri, May 31, 2013 at 4:34 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Check that your path and ld_library_path are set to point to the >> directory where you installed the version you built (the --prefix=<> you >> provided). >> >> On May 31, 2013, at 4:31 PM, Neel Sunil Desai <neel.de...@colorado.edu> >> wrote: >> >> Hi Ralph, >> >> I did install open mpi with the --with-ft=cr option. >> >> Thanks, >> Neel. >> >> On Fri, May 31, 2013 at 4:25 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Okay, it should work it that version. It sounds like you didn't >>> configure OMPI with the --with-ft=cr option - yes? Take a look at >>> "./configure -h" for the ft-related options and ensure you build what you >>> need. C/R support is not built by default. >>> >>> >>> On May 31, 2013, at 3:59 PM, Neel Sunil Desai <neel.de...@colorado.edu> >>> wrote: >>> >>> Open MPI 1.5.4 >>> >>> On Fri, May 31, 2013 at 3:31 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> What OMPI version? >>>> >>>> On May 31, 2013, at 3:17 PM, Neel Sunil Desai <neel.de...@colorado.edu> >>>> wrote: >>>> >>>> > Hi, >>>> > >>>> > I forgot to add. I watched the video of Joshua Hursey and when I type >>>> ompi_info | grep FT, I get FT Checkpoint Support: no ( checkpoint thread : >>>> no). I do not get anything when I type ompi_info | grep crs. >>>> > >>>> > Thanks, >>>> > Neel. >>>> > _______________________________________________ >>>> > users mailing list >>>> > us...@open-mpi.org >>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> >> >> > >