So I recently hit this same problem while doing some scalability
testing. I experimented with adding the --no-restore-pid option, but
found the same problem as you mention. Unfortunately, the problem is
with BLCR, not Open MPI.
BLCR will restart the process with a new PID, but the value ret
Hi
I am using open mpi v1.3.4 with BLCR 0.8.2. I have been testing my
openmpi based program on a 3-node cluster (each node is a Intel Nehalem
based dual quad core) and I have been successful in checkpointing and
restarting the program successfully multiple times.
Recently I moved to a 15 node