Would it be possible to get a backtrace from one of the crashes? It would
be especially helpful if you can add --enable-debug to the OMPI config.


On Wed, Apr 1, 2015 at 1:09 PM, Thomas Klimpel <jacques.gent...@gmail.com>
wrote:

> > You might double-check by running with "--mca btl ^openib" to see if
> that is the source of the warning
>
> The warning appears always, independent of the interconnect, and even when
> running with "--mca btl ^openib".
>
>
> > Does it only crash when you pause it? Or does it crash while normally
> running?
>
> It is very hard to reproduce without pause. It only crashes 1 out of 5
> after half an hour for a run which would take 36 hours. Smaller test cases
> seem to never crash on their own, but when I pause, even quite small test
> cases (less than a minute) crash, if I have more than 72 workers.
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26593.php
>

Reply via email to