Let me poke at it a bit tomorrow - we should be able to avoid the abort. It’s a
bug if we can’t.
> On Jun 26, 2017, at 7:39 PM, Tim Burgess wrote:
>
> Hi Ralph,
>
> Thanks for the quick response.
>
> Just tried again not under slurm, but the same result... (though I
Hi Ralph,
Thanks for the quick response.
Just tried again not under slurm, but the same result... (though I
just did kill -9 orted on the remote node this time)
Any ideas? Do you think my multiple-mpirun idea is worth trying?
Cheers,
Tim
```
[user@bud96 mpi_resilience]$
Hi Ralph, George,
Thanks very much for getting back to me. Alas, neither of these
options seem to accomplish the goal. Both in OpenMPI v2.1.1 and on a
recent master (7002535), with slurm's "--no-kill" and openmpi's
"--enable-recovery", once the node reboots one gets the following
error:
```