Freddie,

Thanks for a quick reply.

Do you think it will help if I force OpenMPI to use TCP instead of Infiniband? 
It should
then avoid that specific function ucm_*.

Isn't it surprising though that other examples work fine and that the said 
example works
on the login node. Surely the hooking is the same?

I understand that this is all runtime stuff, but do you think that my, unusual 
perhaps,
marriage of anaconda and lmod may be causing it. I use lmod to account for 
compiler-mpi
hierarchy but perhaps putting anaconda into my gcc/6.4, openmpi/3.1 branch 
doesn't make
much sense.

Finally, I will also try downgrading openmpi as I am almost sure that only a 
few months
ago I was running on P100 without putting any thought into it.

Best wishes,
Robert
--
Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
WA4 4AD
United Kingdom
--
Email (IBM): [email protected]
Email (STFC): [email protected]
Phone (office): +44 (0) 1925 60 3967
Phone (mobile): +44 778 830 8522
Profile page:
http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
--

[email protected] wrote: -----
To: [email protected]
From: Freddie Witherden 
Sent by: [email protected]
Date: 10/26/2018 06:28PM
Subject: Re: [pyfrmailinglist] Cylinder 3D case freezes on P100

Hi Robert,

Looking at the stack trace it appears as if something is hooking
malloc/free (probably MPI or some related library).  This is almost
always a bad idea as such code is extremely difficult to get right.
PyFR is particularly sensitive to such hooking on account of the fact
that we load MPI and friends at runtime.  Thus, the hooking is done
after a large number of pointers have already been allocated by the
original (un-hooked) malloc.  When these pointers are later freed the
hooked free often mistakenly believes they came from the hooked malloc.
Hilarity ensues.

In my experience there is usually a way to prevent such hooking.

Regards, Freddie.

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.


[attachment "signature.asc" removed by Robert Sawko/UK/IBM]Unless stated 
otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

Reply via email to