Hi Robert,

On 27/10/2018 10:20, Robert Sawko wrote:
> Do you think it will help if I force OpenMPI to use TCP instead of 
> Infiniband? It should
> then avoid that specific function ucm_*.
> 
> Isn't it surprising though that other examples work fine and that the said 
> example works
> on the login node. Surely the hooking is the same?
> 
> I understand that this is all runtime stuff, but do you think that my, 
> unusual perhaps,
> marriage of anaconda and lmod may be causing it. I use lmod to account for 
> compiler-mpi
> hierarchy but perhaps putting anaconda into my gcc/6.4, openmpi/3.1 branch 
> doesn't make
> much sense.
> 
> Finally, I will also try downgrading openmpi as I am almost sure that only a 
> few months
> ago I was running on P100 without putting any thought into it.

Falling back to TCP may help.  However, this can also come with
substantial performance implications.  My advice would therefore be to
build OpenMPI yourself.  This way you can be sure that no libraries are
hooking themselves into application code.

Regards, Freddie.

> 
> Best wishes,
> Robert
> --
> Dr Robert Sawko
> Research Staff Member, IBM
> Daresbury Laboratory
> Keckwick Lane, Warrington
> WA4 4AD
> United Kingdom
> --
> Email (IBM): [email protected]
> Email (STFC): [email protected]
> Phone (office): +44 (0) 1925 60 3967
> Phone (mobile): +44 778 830 8522
> Profile page:
> http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
> --
> 
> [email protected] wrote: -----
> To: [email protected]
> From: Freddie Witherden 
> Sent by: [email protected]
> Date: 10/26/2018 06:28PM
> Subject: Re: [pyfrmailinglist] Cylinder 3D case freezes on P100
> 
> Hi Robert,
> 
> Looking at the stack trace it appears as if something is hooking
> malloc/free (probably MPI or some related library).  This is almost
> always a bad idea as such code is extremely difficult to get right.
> PyFR is particularly sensitive to such hooking on account of the fact
> that we load MPI and friends at runtime.  Thus, the hooking is done
> after a large number of pointers have already been allocated by the
> original (un-hooked) malloc.  When these pointers are later freed the
> hooked free often mistakenly believes they came from the hooked malloc.
> Hilarity ensues.
> 
> In my experience there is usually a way to prevent such hooking.
> 
> Regards, Freddie.
> 

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

Reply via email to