Does this issue apply to your environment:

https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink

We've been running blcr in test for some time.  We haven't encountered this
issue on our platform (Ubuntu 12.04.01), but our systems are extremely
homogenous.

Hope this helps

M


On Thu, Oct 10, 2013 at 12:18 AM, Damien François <
[email protected]> wrote:

>
> Hello,
>
> I witness a strange behavior when using slurm-2.6.0+blcr-0.8.5 and I was
> wondering whether this is normal or if anyone has any advice to offer.
>
> When I submit a job (batch script with one srun_cr step - monothreaded,
> dynamically-linked simple program) , then checkpoint it and stop it with
> scontrol checkpoint vacate, I am able to restart with no problem only if it
> is reallocated to the same node it previously ran on.
>
> If the job is restarted on another node, it starts and then immediately
> stops without any error message nor in stdout/stderr nor in the log. It
> seems to Slurm everything went ok event though the program did not complete
> its execution.
>
> If I use the blcr commands inside the script, ( cr_run and cr_restart ),
> everthing is working fine whichever node it restarts on.
>
> Did any one face the same problem ?
>
> Thanks in advance,
>
> damien=




-- 
Hey! Somebody punched the foley guy!
   - Crow, MST3K ep. 508

Reply via email to