Re: [OMPI users] Openmpi Checkpoint/Restart failed

2010-12-23 Thread 孟宪军
Dear all, I have figured it out. It was a simple issue, I didn't add the "blcr lib" to the $PATH environment varable. However, it can make checkpoint operation, but can't make restart operation successfully. It was so wield. Best regards Xianjun Meng 在 2010年12月23日 下午5:35,孟宪军

Re: [OMPI users] srun and openmpi

2010-12-23 Thread Ralph Castain
I'm not sure there is any documentation yet - not much clamor for it. :-/ It would really help if you included the error message. Otherwise, all I can do is guess, which wastes both of our time :-( My best guess is that the port reservation didn't get passed down to the MPI procs properly -

[OMPI users] srun and openmpi

2010-12-23 Thread Michael Di Domenico
Can anyone point me towards the most recent documentation for using srun and openmpi? I followed what i found on the web with enabling the MpiPorts config in slurm and using the --resv-ports switch, but I'm getting an error from openmpi during setup. I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM

Re: [OMPI users] Call to MPI_Test has large time-jitter

2010-12-23 Thread Yiannis Papadopoulos
On Fri, Dec 17, 2010 at 5:43 PM, Sashi Balasingam wrote: > Hi, > I recently started on an MPI-based, 'real-time', pipelined-processing > application, and the application fails due to large time-jitter in sending > and receiving messages. Here are related info - > > 1)

Re: [OMPI users] Openmpi Checkpoint/Restart failed

2010-12-23 Thread 孟宪军
My main question is: after I finished the checkpoint operation against a simple task which ran on tow machines, I can only restart it on one machine. if I ran the following command to force the ompi-restart to run the program on two machines: *ompi-restart -hostfile ./machine_names

[OMPI users] Openmpi Checkpoint/Restart failed

2010-12-23 Thread 孟宪军
Dear all, I had to try the checkpoint/restart function of Openmpi recently, and after several failure and checking lots of the docement, I am still very confused about how to config the checkpoint/restart function. Can anybody give me a $HOME/.openmpi/mca-params.conf script and introduce me what