Hi Josh
thanks for the reply. pls see below ...
On 05/26/10 09:24, Josh Hursey wrote:
(Sorry for the delay, I missed the C/R question in the mail)
On May 25, 2010, at 9:35 AM, Jeff Squyres wrote:
On May 24, 2010, at 2:02 PM, Michael E. Thomadakis wrote:
| > 2) I have installed blcr V0.8.2 but when I try to built OMPI and
I point to the
| > full installation it complains it cannot find it. Note that I
build BLCR with
| > GCC but I am building OMPI with Intel compilers (V11.1)
|
| Can you be more specific here?
I pointed to the insatllation path for BLCR but config complained
that it
couldn't find it. If BLCR is only needed for checkpoint / restart
then we can
leave without it. Is BLCR needed for suspend/resume of mpi jobs ?
You mean suspend with ctrl-Z? If so, correct -- BLCR is *only* used
for checkpoint/restart. Ctrl-Z just uses the SIGSTP functionality.
So BLCR is used for the checkpoint/restart functionality in Open MPI.
We have a webpage with some more details and examples at the link below:
http://osl.iu.edu/research/ft/ompi-cr/
You should be able to suspend/resume an Open MPI job using
SIGSTOP/SIGCONT without the C/R functionality. We have FAQ item that
talks about how to enable this functionality:
http://www.open-mpi.org/faq/?category=running#suspend-resume
You can combine the C/R and the SIGSTOP/SIGCONT functionality so that
when you 'suspend' a job a checkpoint is taken and the process is
stopped. You can continue the job by sending SIGCONT as normal.
Additionally, this way if the job needs to be terminated for some
reason (e.g., memory footprint, maintenance), it can be safely
terminated and restarted from the checkpoint. I have a example of how
this works at the link below:
http://osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-stop
As far as C/R integration with schedulers/resource managers, I know
that the BLCR folks have been working with Torque to better integrate
Open MPI+BLCR+Torque. If this is of interest, you might want to check
with them on the progress of that project.
So suspend/resume of OpenMPI jobs does not require BLCR. OK so I will
proceed w/o it.
best regards,
Michael
-- Josh
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
% -------------------------------------------------------------------- \
% Michael E. Thomadakis, Ph.D. Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu Texas A&M University \
% web: http://alphamike.tamu.edu Supercomputing Center \
% Voice: 979-862-3931 Teague Research Center, 104B \
% FAX: 979-847-8643 College Station, TX 77843, USA \
% -------------------------------------------------------------------- \