mpi.org>> de
la part de Richard Graham <richa...@mellanox.com<mailto:richa...@mellanox.com>>
Envoyé : jeudi 28 septembre 2017 18:09
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI internal error
I just talked with George, who brought me up to speed on this particular
prob
lanox.com>
Envoyé : jeudi 28 septembre 2017 18:09
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI internal error
I just talked with George, who brought me up to speed on this particular
problem.
I would suggest a couple of things:
- Look at the HW error counters, and see if y
users] Open MPI internal error
John,
On the ULFM mailing list you pointed out, we converged toward a hardware issue.
Resources associated with the dead process were not correctly freed, and
follow-up processes on the same setup would inherit issues related to these
lingering messages. However
John,
On the ULFM mailing list you pointed out, we converged toward a hardware
issue. Resources associated with the dead process were not correctly freed,
and follow-up processes on the same setup would inherit issues related to
these lingering messages. However, keep in mind that the setup was
ps. Before you do the reboot of a compute node, have you run 'ibdiagnet' ?
On 28 September 2017 at 11:17, John Hearns wrote:
>
> Google turns this up:
> https://groups.google.com/forum/#!topic/ulfm/OPdsHTXF5ls
>
>
> On 28 September 2017 at 01:26, Ludovic Raess
Google turns this up:
https://groups.google.com/forum/#!topic/ulfm/OPdsHTXF5ls
On 28 September 2017 at 01:26, Ludovic Raess wrote:
> Hi,
>
>
> we have a issue on our 32 nodes Linux cluster regarding the usage of Open
> MPI in a Infiniband dual-rail configuration (2 IB