Does this issue apply to your environment: https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink
We've been running blcr in test for some time. We haven't encountered this issue on our platform (Ubuntu 12.04.01), but our systems are extremely homogenous. Hope this helps M On Thu, Oct 10, 2013 at 12:18 AM, Damien François < [email protected]> wrote: > > Hello, > > I witness a strange behavior when using slurm-2.6.0+blcr-0.8.5 and I was > wondering whether this is normal or if anyone has any advice to offer. > > When I submit a job (batch script with one srun_cr step - monothreaded, > dynamically-linked simple program) , then checkpoint it and stop it with > scontrol checkpoint vacate, I am able to restart with no problem only if it > is reallocated to the same node it previously ran on. > > If the job is restarted on another node, it starts and then immediately > stops without any error message nor in stdout/stderr nor in the log. It > seems to Slurm everything went ok event though the program did not complete > its execution. > > If I use the blcr commands inside the script, ( cr_run and cr_restart ), > everthing is working fine whichever node it restarts on. > > Did any one face the same problem ? > > Thanks in advance, > > damien= -- Hey! Somebody punched the foley guy! - Crow, MST3K ep. 508
