Hi,
as the upgrade of 14.3 -> 14.11 was done in parallel to the upgrade from
Scientific Linux 6.5 -> 6.6 (including kernel updates) I'm not 100% sure
if we really had problems with MPI.
We had problems with jobs crashing nodes while Slurm was removing the
cgroup folder for those jobs. Afterwards this was related to NFSv4
problems but at that point we already had rebuild MPI and other stuff.
What we are doing is that we point to the version in production with a
"default" symlink. This seems to ease upgrades of bug fix versions as
the MPI lib findes the "new" Slurm at the same path as before.
Regards,
Uwe
Am 06.02.2015 um 00:27 schrieb Peter A Ruprecht:
> From a message by Uwe Sauter in an earlier thread I read:
>
> ---
> Is it really necessary to re-link MPI for every bug fix release? I had
> some trouble with MPI after the upgrade 14.3 -> 14.11 but I haven't seen
> problems between bug fix releases so far…
> ---
>
> I wonder if Uwe could expand on the trouble that was encountered, or
> whether anyone else on the list ran into similar problems.
>
> I ask because some of our users have started reporting a 10x increase in
> run-times of OpenMPI jobs since we upgraded to 14.11.3 from 14.3. It's
> possible there is some other problem going on in our cluster, but all of
> our hardware checks including Infiniband diagnostics look pretty clean.
>
> Thanks for any suggestions,
> Peter