Freddie, Thanks for a quick reply.
Do you think it will help if I force OpenMPI to use TCP instead of Infiniband? It should then avoid that specific function ucm_*. Isn't it surprising though that other examples work fine and that the said example works on the login node. Surely the hooking is the same? I understand that this is all runtime stuff, but do you think that my, unusual perhaps, marriage of anaconda and lmod may be causing it. I use lmod to account for compiler-mpi hierarchy but perhaps putting anaconda into my gcc/6.4, openmpi/3.1 branch doesn't make much sense. Finally, I will also try downgrading openmpi as I am almost sure that only a few months ago I was running on P100 without putting any thought into it. Best wishes, Robert -- Dr Robert Sawko Research Staff Member, IBM Daresbury Laboratory Keckwick Lane, Warrington WA4 4AD United Kingdom -- Email (IBM): [email protected] Email (STFC): [email protected] Phone (office): +44 (0) 1925 60 3967 Phone (mobile): +44 778 830 8522 Profile page: http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko -- [email protected] wrote: ----- To: [email protected] From: Freddie Witherden Sent by: [email protected] Date: 10/26/2018 06:28PM Subject: Re: [pyfrmailinglist] Cylinder 3D case freezes on P100 Hi Robert, Looking at the stack trace it appears as if something is hooking malloc/free (probably MPI or some related library). This is almost always a bad idea as such code is extremely difficult to get right. PyFR is particularly sensitive to such hooking on account of the fact that we load MPI and friends at runtime. Thus, the hooking is done after a large number of pointers have already been allocated by the original (un-hooked) malloc. When these pointers are later freed the hooked free often mistakenly believes they came from the hooked malloc. Hilarity ensues. In my experience there is usually a way to prevent such hooking. Regards, Freddie. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at https://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout. [attachment "signature.asc" removed by Robert Sawko/UK/IBM]Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at https://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
