Re: [OMPI users] ompi-restart using different nodes

2009-12-09 Thread Jonathan Ferland
Hi Josh, Thanks for helping. That solved the problem!!! cheers, Jonathan Josh Hursey wrote: So I tried to reproduce this problem today, and everything worked fine for me using the trunk. I haven't tested v1.3/v1.4 yet. I tried checkpointing with one hostfile then restarting with each of

Re: [OMPI users] ompi-restart using different nodes

2009-12-09 Thread Josh Hursey
So I tried to reproduce this problem today, and everything worked fine for me using the trunk. I haven't tested v1.3/v1.4 yet. I tried checkpointing with one hostfile then restarting with each of the following: - No hostfile - a hostfile with completely different machines - a hostfile

Re: [OMPI users] ompi-restart using different nodes

2009-12-08 Thread Jonathan Ferland
I did the same test using 1.3.4 and still the same issue I also tried to use the tm interface instead of specifying the hostfile, same result. thanks, Jonathan Josh Hursey wrote: Though I do not test this scenario (using hostfiles) very often, it used to work. The ompi-restart command

Re: [OMPI users] ompi-restart using different nodes

2009-12-02 Thread Jonathan Ferland
Hi Josh, In case it help, I am running 1.3.3 compiled as follow : ../configure --enable-ft-thread --with-ft=cr --enable-mpi-threads --with-blcr=... --with-blcr-libdir=...--disable-openib-rdmacm --prefix= I ran my application like this : mpirun -am ft-enable-cr --hostfile host -np 2

Re: [OMPI users] ompi-restart using different nodes

2009-12-02 Thread Josh Hursey
Though I do not test this scenario (using hostfiles) very often, it used to work. The ompi-restart command takes a --hostfile (or -- machinefile) argument that is passed directly to the mpirun command. I wonder if something broke recently with this handoff. I can certainly checkpoint with

[OMPI users] ompi-restart using different nodes

2009-12-02 Thread Jonathan Ferland
Hi, I am trying to use BLCR checkpointing in mpi. I am currently able to run my application using some hostfile, checkpoint the run, and then restart the application using the same hostfile. The thing I would like to do is to restart the application with a different hostfile. But this leads