(Sorry for the late reply)
On Jun 7, 2010, at 4:48 AM, Nguyen Kim Son wrote:
> Hello,
>
> I'n trying to get functions like orte-checkpoint, orte-restart,... works but
> there are some errors that I don't have any clue about.
>
> Blcr (0.8.2) works fine apparently and I have installed openmpi 1.4.2 from
> source with option blcr.
> The command
> mpirun -np 4 -am ft-enable-cr ./checkpoint_test
> seemed OK but
> orte-checkpoint --term PID_of_checkpoint_test ( obtaining after ps -ef | grep
> mpirun )
> does not return and shows nothing like errors!
You mean the PID of 'mpirun', right?
Does it checkpoint correctly without the '--term' argument?
Can you try the v1.5 release candidate to see if you have the same problem?
http://www.open-mpi.org/software/ompi/v1.5/
What MCA parameters do you have set in your environment?
-- Josh
>
> Then, I checked with
> ompi-ps
> this time, I obtain:
> oob-tcp: Communication retries exceeded. Can not communicate with peer
>
> Does anyone has the same problem?
> Any idea is welcomed!
> Thanks,
> Son.
>
>
> --
> -
> Son NGUYEN KIM
> Antibes 06600
> Tel: 06 48 28 37 47
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users